如何使用熊猫将调查答案映射到选项编号?

时间:2019-11-02 05:19:26

标签: python python-3.x pandas dataframe dictionary

我有一个如下所示的数据框和系列

user_response = pd.DataFrame({
'val_string': ['Correct','Mute','Test13','Test15','Unverified',np.nan,'>10 Edu'],
'num':[np.nan,np.nan,1201,1203,np.nan,np.nan,np.nan]
 })

option_numbers = pd.DataFrame({
'answer':['Correct','Incorrect','mute','cannot see','paralysed','illiterate','tired','cannot hear','NIL',
          'English','Malay','Mandarin','Hokkien','Teochew','Cantonese','Other - specify','Chinese',
          '0 Edu','1-6 Edu','7-10 Edu','>10 Edu','Unreachable','Incomplete','Unverified','Complete'],
                             'option':[1,0,0,1,2,3,4,5,6,1,2,3,4,5,6,7,8,1,2,3,4,5,0,1,2]})
option_number = option_number.set_index('answer')['option']

尽管我能够根据下面的代码为匹配项成功映射,但是我丢失了non-matching个项的现有值

user_response['num'] = user_response['val_string'].map(option_numbers)

如果运行我的代码,您会看到它丢失了Test13Test15的值,因为它不存在于option_numbers series中并且与Mute不匹配由于大小写敏感问题,在mute中使用

您能帮我弄清楚吗?

我希望我的输出如下所示

enter image description here

1 个答案:

答案 0 :(得分:2)

首先,您需要数据框中的两列都大写或小写

user_response['val_string'] = user_response['val_string'].str.lower()
option_numbers['answer'] = option_numbers['answer'].str.lower()

然后只需使用fillna填写缺失值,就必须将两个数据框中的索引都设置为正确的列,以使其起作用。

user_response = user_response.set_index('val_string')
option_numbers = option_numbers.set_index('answer')
user_response['num'] = user_response['num'].fillna(option_numbers['option'])
user_response
val_string
correct          1.0
mute             0.0
test13        1201.0
test15        1203.0
unverified       1.0
NaN              NaN
>10 edu          4.0
Name: num, dtype: float64
相关问题