Python,一个热编码器的修改版

时间:2018-09-08 06:19:30

标签: python pandas dataframe

我需要帮助将多个列(例如a1和a2列)的唯一值转换为新列,然后将b1和b2列的值分别分配给那些新创建的列。

例如,如果我有一个数据帧df,如下所示:

import pandas as pd
import numpy as np
df = pd.DataFrame({'a1':['q','w','e','r'], 'a2':['s','e','q','u'], 'b1':[1,2,3,4], 'b2':[5,6,7,8],})

print(df)
  a1 a2  b1  b2
0  q  s   1   5
1  w  e   2   6
2  e  q   3   7
3  r  u   4   8

列a1和a2的唯一值是['e','q','r','s','u','w']。

np.unique(df.loc[:,['a1','a2']].values)
array(['e', 'q', 'r', 's', 'u', 'w'], dtype=object)

我想将df转换为新的数据帧df1,如下所示:

print(df1)
   e  q  r  s  u  w
0  0  1  0  5  0  0
1  6  0  0  0  0  2
2  3  7  0  0  0  0
3  0  0  4  0  8  0

请注意,“ q”和“ s”出现在df的第一行中,因此1(来自b1列)和5(来自b2列)被分配给数据帧df1的q和s列,而其他列为0

我本可以在R中使用melt和dcast函数来实现这一点,但是我不确定如何在Python中做到这一点。

谢谢。

1 个答案:

答案 0 :(得分:1)

<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
  <defs>
    <style>
    .d1{
      fill: red;
    }

    .d2{
      fill: green;
    }

    .d3{
      fill: orange;
    }
    </style>
    <symbol id="icon-BusinessUnit">
      <path d="M9.974 30.871h6.519v-6.666h3.918v6.666h6.519v-29.843h-16.955v29.843zM19.641 4.545h3.918v3.918h-3.918v-3.918zM19.641 10.842h3.918v3.918h-3.918v-3.918zM19.641 17.139h3.918v3.918h-3.918v-3.918zM13.344 4.545h3.918v3.918h-3.918v-3.918zM13.344 10.842h3.918v3.918h-3.918v-3.918zM13.344 17.139h3.918v3.918h-3.918v-3.918zM28.451 3.701h8.452v27.17h-5.935v-6.068h-2.517v-21.102zM33.835 21.936h-0v-3.567h-3.567v3.567h3.567zM33.835 16.203h-0v-3.567h-3.567v3.567h3.567zM33.835 10.47h-0v-3.567h-3.567v3.567h3.567zM0 30.871v-27.17h8.452v21.102h-2.517v6.068h-5.935zM3.068 6.903v3.567h3.567v-3.567h-3.567zM3.068 12.636v3.567h3.567v-3.567h-3.567zM3.068 18.369v3.567h3.567v-3.567h-3.567z"></path>
    </symbol>
  </defs>
  <use xlink:href="#icon-BusinessUnit" class="d1" transform="translate(0 0) scale(1)"/>
  <use xlink:href="#icon-BusinessUnit" class="d2" transform="translate(100 0) scale(1.5)"/>
  <use xlink:href="#icon-BusinessUnit" class="d3" transform="translate(200 0) scale(2)"/>
</svg>