Question

我试图过滤（并因此更改）依赖于其他列中值的熊猫中的某些行。说我的dataFrame看起来像这样：

SENT    ID    WORD        POS        HEAD
1       1     I           NOUN        2
1       2     like        VERB        0
1       3     incredibly  ADV         4
1       4     brown       ADJ         5
1       5     sugar       NOUN        2
2       1     Here        ADV         2
2       2     appears     VERB        0
2       3     my          PRON        5
2       4     next        ADJ         5
2       5     sentence    NOUN        0

结构使得“ HEAD”列指向该行所依赖的单词的索引。例如，如果“棕色”依赖于“糖”，则“棕色”的头为4，因为“糖”的索引为4。

我需要提取POS为ADV且其头部为POS VERB的所有行的df，因此“此处”将位于新df中，但不会“令人难以置信”（并且可能更改其WORD条目）。目前，我正在循环执行此操作，但我不认为这是大熊猫方法，而且还会在以后产生问题。这是我当前的代码（split（“-”）来自另一个故事-忽略它）：

def get_head(df, dependent):
    head = dependent
    target_index = int(dependent['HEAD'])
    if target_index == 0:
        return dependent
    else:
        if target_index < int(dependent['INDEX']):
            # 1st int in cell
                while (int(head['INDEX'].split("-")[0]) > target_index):
                    head = data.iloc[int(head.name) - 1]
        elif target_index > int(dependent['INDEX']):
            while int(head['INDEX'].split("-")[0]) < target_index:
                    head = data.iloc[int(head.name) + 1]
    return head

编写此函数时遇到的一个困难是（当时）我没有“ SENTENCE”列，因此我不得不手动找到最近的头部。我希望添加SENTENCE列应该使事情变得容易一些，尽管要注意的是，由于df中有成百上千个这样的句子，因此仅搜索索引“ 5”就不会做，因为有数百行df['INDEX']=='5'。

以下是我如何使用get_head（）的示例：

def change_dependent(extract_col, extract_value, new_dependent_pos, head_pos):
    name = 0
    sub_df = df[df[extract_col] == extract_value] #this is another condition on the df. 
    for i, v in sub_df.iterrows():
        if (get_head(df, v)['POS'] == head_pos):
            df.at[v.name, 'POS'] = new_dependent_pos
    return df

change_dependent('POS', 'ADV', 'ADV:VERB', 'VERB')

这里有人可以想到一种更优雅/高效/熊猫的方式，使我可以获取所有头为VERB的ADV实例吗？

Answer 1

import pandas as pd
df = pd.DataFrame([[1,1,'I','NOUN',2],
                  [1,2,'like','VERB',0],
                  [1,3,'incredibly','ADV',4],
                  [1,4,'brown','ADJ',4],
                  [1,5,'sugar','NOUN',5],
                  [2,1,'Here','ADV',2],
                  [2,2,'appears','VERB',0],
                  [2,3,'my','PRON',5],
                  [2,4,'next','ADJ',5],
                  [2,5,'sentance','NOUN',0]]
                  ,columns=['SENT','ID','WORD','POS','HEAD'])

adv=df[df['POS']=='ADV']
temp=df[df['POS']=='VERB'][['SENT','ID','POS']].merge(adv,left_on=['SENT','ID'],right_on=['SENT','HEAD']) 
temp['WORD']

过滤时如何考虑dataFrame中的其他行？

1 个答案: