Question

我无法使用pd.merge来为该数据帧填写一些丢失的数据：

fulldf.head(20)

 code    Major_Project_Theme
0   8   Human development
1   11  
2   1   Economic management
3   6   Social protection and risk management
4   5   Trade and integration
5   2   Public sector governance
6   11  Environment and natural resources management
7   6   Social protection and risk management
8   7   Social dev/gender/inclusion
9   7   Social dev/gender/inclusion
10  5   Trade and integration
11  4   Financial and private sector development
12  6   Social protection and risk management
13  6   
14  2   Public sector governance
15  4   Financial and private sector development
16  11  Environment and natural resources management
17  8   
18  10  Rural development
19  7   `

使用此参考表：

fullgroupeddf = fulldf.groupby(['code', 'Major_Project_Theme']).count()
fullgroupeddf

code    Major_Project_Theme
1   Economic management
10  Rural development
11  Environment and natural resources management
2   Public sector governance
3   Rule of law
4   Financial and private sector development
5   Trade and integration
6   Social protection and risk management
7   Social dev/gender/inclusion
8   Human development
9   Urban development `

我尝试使用它，但是没有用：

filleddf = fulldf.merge(fullgroupeddf, how='left', left_on='code', right_on='code')

老实说，我不知道我在做什么。想法是使用我创建的ref表在第一个数据帧中的Major_Project_Theme下填写缺少的值。我要在我的merge语句中输入什么？或者有更好的方法吗？

Answer 1

假设在数据丢失的行中，您实际上有一个空字符串''，则可以在transform(max)代码之后使用groupby，例如：

filleddf = fulldf.copy() #this is just if you want different dataframes
# filled missing value in the column Major_Project_Theme with:
filleddf['Major_Project_Theme'] = (filleddf.groupby('code')['Major_Project_Theme']
                                            .transform(max))

filleddf的所有行均应填充与“代码”关联的良好“ Major_Project_Theme”

Answer 2

只需使用groupby函数将您的fulldf按代码分组。然后遍历每组并继续填写缺少的信息。希望这会有所帮助。

使用pd.merge用参考表填充左侧数据框上的缺失值

2 个答案: