考虑到我有以下数据。
import pandas as pd
age = [[1,2,3],[2,1],[4,2,3,1],[2,1,3]]
frame = {'age': age }
result = pd.DataFrame(frame)
ver=pd.DataFrame(result.age.values.tolist(), index= result.index)
listado=pd.unique(ver.values.ravel('K'))
cleanedList = [x for x in listado if str(x) != 'nan']
for col in cleanedList:
result[col] = 0
#Return values
age 1.0 2.0 4.0 3.0
[1, 2, 3] 0 0 0 0
[2, 1] 0 0 0 0
[4, 2, 3, 1] 0 0 0 0
[2, 1, 3] 0 0 0 0
如何在年龄列中与每个列表相对应的列中估算1。因此最终输出将是:
age 1.0 2.0 4.0 3.0
[1, 2, 3] 1 1 0 1
[2, 1] 1 1 0 0
[4, 2, 3, 1] 1 1 1 1
[2, 1, 3] 1 1 1 0
考虑“年龄”列中元素的数量是动态的(例如,我输入了4个数字,但实际上它们可以更多)。
答案 0 :(得分:1)
使用sklearn
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
s=pd.DataFrame(mlb.fit_transform(result['age']),columns=mlb.classes_, index=result.index)
s
1 2 3 4
0 1 1 1 0
1 1 1 0 0
2 1 1 1 1
3 1 1 1 0
#df = df.join(s)