我无法使用pd.merge
来为该数据帧填写一些丢失的数据:
fulldf.head(20)
code Major_Project_Theme
0 8 Human development
1 11
2 1 Economic management
3 6 Social protection and risk management
4 5 Trade and integration
5 2 Public sector governance
6 11 Environment and natural resources management
7 6 Social protection and risk management
8 7 Social dev/gender/inclusion
9 7 Social dev/gender/inclusion
10 5 Trade and integration
11 4 Financial and private sector development
12 6 Social protection and risk management
13 6
14 2 Public sector governance
15 4 Financial and private sector development
16 11 Environment and natural resources management
17 8
18 10 Rural development
19 7 `
使用此参考表:
fullgroupeddf = fulldf.groupby(['code', 'Major_Project_Theme']).count()
fullgroupeddf
code Major_Project_Theme
1 Economic management
10 Rural development
11 Environment and natural resources management
2 Public sector governance
3 Rule of law
4 Financial and private sector development
5 Trade and integration
6 Social protection and risk management
7 Social dev/gender/inclusion
8 Human development
9 Urban development `
我尝试使用它,但是没有用:
filleddf = fulldf.merge(fullgroupeddf, how='left', left_on='code', right_on='code')
老实说,我不知道我在做什么。想法是使用我创建的ref表在第一个数据帧中的Major_Project_Theme
下填写缺少的值。我要在我的merge语句中输入什么?或者有更好的方法吗?
答案 0 :(得分:2)
假设在数据丢失的行中,您实际上有一个空字符串''
,则可以在transform(max)
代码之后使用groupby
,例如:
filleddf = fulldf.copy() #this is just if you want different dataframes
# filled missing value in the column Major_Project_Theme with:
filleddf['Major_Project_Theme'] = (filleddf.groupby('code')['Major_Project_Theme']
.transform(max))
filleddf
的所有行均应填充与“代码”关联的良好“ Major_Project_Theme”
答案 1 :(得分:0)
只需使用groupby函数将您的fulldf按代码分组。然后遍历每组并继续填写缺少的信息。希望这会有所帮助。