熊猫枢轴:多列枢轴

时间:2018-06-28 19:48:55

标签: python pandas pivot-table

我最初是从以下数据帧开始的:

数据集与用户回答具有多个答案选择的多个问题并且用户能够回答多个答案有关。

movie_id, user_id, rated_value, question_id, answer_id, genre, user_gender, user_ethnicity
101, 345, 3.5, 1, 1, comedy, male, white
101, 345, 3.5, 1, 2, comedy, male, white
101, 345, 3.5, 2, 1, comedy, male, white
125, 345, 4.5, 1, 4, drama, male, white
101, 233, 4.0, 1, 3, comedy, female, black
101, 233, 4.0, 2, 2, comedy, female, black
125, 233, 3.0, 1, 1, drama, female, black
125, 233, 3.0, 2, 2, drama, female, black
125, 333, 3.0, 1, 1, comedy, male, asian
125, 333, 3.0, 2, 2, comedy, male, asian 

我想通过旋转将桌子弄平。我可以成功完成工作而无需引入genre, user_gender, user_ethnicity,如下所示:

pivoted_df = df_to_pivot.assign(val=1).pivot_table(
    index=['movie_id',
           'user_id',
           'rated_value'],
    columns=['question_id',
             'answer_id'],
    values=['question_id', 'answer_id'],
    fill_value=0)

然后将问题和答案ID组合在一起,以便列将显示为1_1, 1_2

pivoted_df.columns = pivoted_df.columns.droplevel()
pivoted_df.columns = ['{}_{}'.format(l1, l2).strip() for l1, l2 in pivoted_df.columns.values]
pivoted_df = pivoted_df.reset_index()

movie_id user_id rating_value 1_1 1_2 1_3 1_4...

但是尝试添加genre, user_gender, user_ethnicity

pivoted_df = df_to_pivot.assign(val=1).pivot_table(
    index=['movie_id',
           'user_id',
           'rated_value'],
    columns=['question_id',
             'answer_id', 'genre', 'user_gender','user_ethnicity'],
    values=['question_id', 'answer_id', 'genre', 'user_gender','user_ethnicity'],
    fill_value=0)

它实际上不起作用。

我的目标是像其余部分一样枢转genre, user_gender, user_ethnicity,以便将列 movie_id user_id rated_value 1_1 1_2 1_3 1_4...comedy, drama...,male, female, black, white, asian

output: 
movie_id, user_id, rated_value , 1_1, 1_2, 1_3, 1_4, comedy, drama, male, female, white, black, asian
101, 345, 3.5, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0

目标是每行获取movie_id,user_id对,其他所有内容都反映为1和0。

1 个答案:

答案 0 :(得分:1)

将question_id和answer_id合并为一列,然后使用pd.get_dummies

df['QandA'] = df['question_id'].astype(str) + '_' + df['answer_id'].astype(str)

pd.get_dummies(df, columns=['QandA','genre','user_gender','user_ethnicity'])

输出:

   movie_id  user_id  rated_value  question_id  answer_id  QandA_1_1  QandA_1_2  QandA_1_3  QandA_1_4  QandA_2_1  QandA_2_2  genre_comedy  genre_drama  user_gender_female  \
0       101      345          3.5            1          1          1          0          0          0          0          0             1            0                   0   
1       101      345          3.5            1          2          0          1          0          0          0          0             1            0                   0   
2       101      345          3.5            2          1          0          0          0          0          1          0             1            0                   0   
3       125      345          4.5            1          4          0          0          0          1          0          0             0            1                   0   
4       101      233          4.0            1          3          0          0          1          0          0          0             1            0                   1   
5       101      233          4.0            2          2          0          0          0          0          0          1             1            0                   1   
6       125      233          3.0            1          1          1          0          0          0          0          0             0            1                   1   
7       125      233          3.0            2          2          0          0          0          0          0          1             0            1                   1   
8       125      333          3.0            1          1          1          0          0          0          0          0             1            0                   0   
9       125      333          3.0            2          2          0          0          0          0          0          1             1            0                   0   

   user_gender_male  user_ethnicity_asian  user_ethnicity_black  user_ethnicity_white  
0                 1                     0                     0                     1  
1                 1                     0                     0                     1  
2                 1                     0                     0                     1  
3                 1                     0                     0                     1  
4                 0                     0                     1                     0  
5                 0                     0                     1                     0  
6                 0                     0                     1                     0  
7                 0                     0                     1                     0  
8                 1                     1                     0                     0  
9                 1                     1                     0                     0  

我认为您需要pd.get_dummies

pd.get_dummies(df, columns=['genre','user_gender','user_ethnicity'])

输出:

   movie_id  user_id  rated_value  question_id  answer_id  genre_comedy  genre_drama  user_gender_female  user_gender_male  user_ethnicity_asian  user_ethnicity_black  \
0       101      345          3.5            1          1             1            0                   0                 1                     0                     0   
1       101      345          3.5            1          2             1            0                   0                 1                     0                     0   
2       101      345          3.5            2          1             1            0                   0                 1                     0                     0   
3       125      345          4.5            1          4             0            1                   0                 1                     0                     0   
4       101      233          4.0            1          3             1            0                   1                 0                     0                     1   
5       101      233          4.0            2          2             1            0                   1                 0                     0                     1   
6       125      233          3.0            1          1             0            1                   1                 0                     0                     1   
7       125      233          3.0            2          2             0            1                   1                 0                     0                     1   
8       125      333          3.0            1          1             1            0                   0                 1                     1                     0   
9       125      333          3.0            2          2             1            0                   0                 1                     1                     0   

   user_ethnicity_white  
0                     1  
1                     1  
2                     1  
3                     1  
4                     0  
5                     0  
6                     0  
7                     0  
8                     0  
9                     0