跨越pandas数据帧匹配数据

时间:2017-09-29 23:21:28

标签: python pandas

我有两个数据框,一个有作者和他们的文本 - 还有其他专栏 - 另一个有作者及其性别和学科。

DF1
====================================
author date   text   
------------------------------------
a1     2006   "Thank you for..."
a2     2007   "When I was asked..."
a3     2014   "Biology is the ..."
a2     2010   "In the intervening..."

DF2
====================================
author gender   discipline   
------------------------------------
a2     male      psychologist
a1     female    neurologist
a3     female    biologist

我正在浏览pandas文档并搜索SO和其他网站,试图了解我如何将DF1中的作者与他们在DF2中的性别相匹配。如果我在DF1中进行现场操作,或者我需要创建新的数据帧,我不在乎,只要新数据框包含DF1中的所有信息以及来自DF2的其他信息,性别和/或纪律

我在这里甚至没有代码的开头 - 我刚刚完成了对各种unicode错误的DF2擦除,所以我在这一天的结束时有点结束

2 个答案:

答案 0 :(得分:1)

选项1
pd.DataFrame.merge

DF1.merge(DF2[['author', 'gender']], 'left')

  author  date                     text  gender
0     a1  2006       "Thank you for..."  female
1     a2  2007    "When I was asked..."    male
2     a3  2014     "Biology is the ..."  female
3     a2  2010  "In the intervening..."    male

选项2
pd.Series.map

d = dict(DF2[['author', 'gender']].values)
DF1.assign(gender=DF1.author.map(d))

  author  date                     text  gender
0     a1  2006       "Thank you for..."  female
1     a2  2007    "When I was asked..."    male
2     a3  2014     "Biology is the ..."  female
3     a2  2010  "In the intervening..."    male

选项2.1
制作d

的其他方法
d = DF2.set_index('author').gender
DF1.assign(gender=DF1.author.map(d))

  author  date                     text  gender
0     a1  2006       "Thank you for..."  female
1     a2  2007    "When I was asked..."    male
2     a3  2014     "Biology is the ..."  female
3     a2  2010  "In the intervening..."    male

选项2.2
制作d

的其他方法
d = dict(zip(DF2.author, DF2.gender))
DF1.assign(gender=DF1.author.map(d))

  author  date                     text  gender
0     a1  2006       "Thank you for..."  female
1     a2  2007    "When I was asked..."    male
2     a3  2014     "Biology is the ..."  female
3     a2  2010  "In the intervening..."    male

选项3
pd.DataFrame.join

DF1.join(DF2.set_index('author').gender, on='author')

  author  date                     text  gender
0     a1  2006       "Thank you for..."  female
1     a2  2007    "When I was asked..."    male
2     a3  2014     "Biology is the ..."  female
3     a2  2010  "In the intervening..."    male

答案 1 :(得分:1)

import pandas as pd

df = pd.DataFrame({'author':['a1','a2','a3','a2'],
                          'date':[2006,2007,2014,2010],
                          'text':["Thank you for","when i was asked","i m the biology","in the intervening"]})


df2 = pd.DataFrame({'author':['a2','a1','a3'],
                    'gender':['male','female','female'],
                    'disciple':['pyshologist','neurologist','biologist']})




print(pd.merge(df,df2, on = 'author'))