匹配列值并用''python替换重复项

时间:2017-10-26 11:34:32

标签: python pandas dataframe

我有3列,如下所示,pandas dataframe中的标题为screenName screen_name_retweet screen_name_mention User1 User10 User1 User4 User10 User5 User3 User3 User12 User6 User10 User7

screen_name

我想要的是将screen_name_retweetscreen_name_mentionscreen_name and screen_name_retweet or screen_name_mention匹配,如果在screen_name_retweet and screen_name_mention之间找到重复项,则将该列('')替换为{{{ 1}}。所以上面的列应该是这样的

 screenName     screen_name_retweet     screen_name_mention
    User1                 User10                      
    User4                 User10                      User5
    User3                                             User12
    User6                 User10                      User7

如何获得所需的答案?

更新:

我已经尝试过这个:

df.loc[(df['screenName'].duplicated() & df['screen_name_mention'].duplicated()), ['screen_name_mention']] = ''

但没有任何反应,表格保持不变

2 个答案:

答案 0 :(得分:0)

使用replace方法

import pandas as pd
df = pd.read_csv(file_name)          #read your file as dataframe
for index, row in df.iterrows():
    if row[0]==row[1]:
        df['screen_name_retweet'].replace(row[1], "", inplace = True)
    if row[0] == row[2]:
        df['screen_name_mention'].replace(row[2], "", inplace = True)
print(df)          

答案 1 :(得分:0)

import pandas as pd
a = pd.DataFrame([["user1","user10","user1"],
                  ["user4","user10","user5"],
                  ["user3","user3","user12"]] ,
                  columns=["i1","i2","i3"]) #simplified input dataframe
for i in a.index:
    m = a.loc[i].duplicated() #mask array for each rows
    a.loc[i] = a.loc[i].mask(m).fillna("") #filter duplicates and fill by empty string

我认为这个解决方案可以从性能的角度进行改进,但它确实有效。