大熊猫在随后的专栏中识别重复并保持首次出现

时间:2018-08-29 16:26:33

标签: python pandas

我知道如何摆脱熊猫的重复行,但是我的问题略有不同。假设我有一个像这样的数据框:

product  from    stop_1        stop_2  stop_3  stop_4 stop_5 stop_6  stop_7
metal    Portugal Spain        France  Ukraine Spain  France Ukraine Spain
fruit    Spain    France       Italy
dairy    Italy    Switzerland  Italy   Switzerland

这是我想要获得的:

product  from    stop_1   stop_2  stop_3  stop_4 stop_5 stop_6  stop_7
metal    Portugal Spain   France  Ukraine 
fruit    Spain    France  Italy
dairy    Italy    Switzerland  

我怎么能得到这个?

3 个答案:

答案 0 :(得分:3)

maskduplicated一起使用

df.mask(df.apply(lambda x : x.duplicated(),1))
Out[443]: 
  product      from       stop_1  stop_2   stop_3 stop_4 stop_5 stop_6 stop_7
0   metal  Portugal        Spain  France  Ukraine    NaN    NaN    NaN    NaN
1   fruit     Spain       France   Italy      NaN    NaN    NaN    NaN    NaN
2   dairy     Italy  Switzerland     NaN      NaN    NaN    NaN    NaN    NaN

答案 1 :(得分:1)

您可以使用drop_duplicatesreindex

In [417]: df.apply(pd.Series.drop_duplicates, 1).reindex(columns=df.columns)
Out[417]:
  product      from       stop_1  stop_2   stop_3  stop_4  stop_5  stop_6  stop_7
0   metal  Portugal        Spain  France  Ukraine     NaN     NaN     NaN     NaN
1   fruit     Spain       France   Italy      NaN     NaN     NaN     NaN     NaN
2   dairy     Italy  Switzerland     NaN      NaN     NaN     NaN     NaN     NaN

答案 2 :(得分:1)

这是我想出的:

df
Out[42]: 
  product      from       stop_1  stop_2  ...   stop_4  stop_5   stop_6 stop_7
0   metal  Portugal        Spain  France  ...    Spain  France  Ukraine  Spain
1   fruit     Spain       France   Italy  ...      NaN     NaN      NaN    NaN
2   dairy     Italy  Switzerland   Italy  ...      NaN     NaN      NaN    NaN

# save column names first
colnames = list(df.columns)
df1 = pd.DataFrame([row.unique() for index, row in df.iterrows()])
# return column names
df1.columns = colnames[0:len(df1.columns)]

df1
Out[46]: 
  product      from       stop_1  stop_2   stop_3
0   metal  Portugal        Spain  France  Ukraine
1   fruit     Spain       France   Italy      NaN
2   dairy     Italy  Switzerland     NaN     None
相关问题