Question

嗨，我在重塑df时遇到问题。

我有：

Netflix     TV      DVD 
   0.1      0.2     0.3
   0.12     0.5     0.15
   0.4      0.6     0.8
            0.5     0.41
            0.41
            0.2

我想将我的df转换为：

Netflix  [0.1, 0.12, 0.4]
TV       [0.2, 0.5, 0.6, 0.5, 0.41, 0.2] 
DVD      [0.3, 0.15, 0.8, 0.41]

不确定 stack（）或 pivot（）在这种df上如何工作。任何帮助表示赞赏。

Answer 1

`stack`

堆叠在重塑数组时会丢弃空值

df.stack().groupby(level=1).agg(list)

DVD                 [0.3, 0.15, 0.8, 0.41]
Netflix                   [0.1, 0.12, 0.4]
TV         [0.2, 0.5, 0.6, 0.5, 0.41, 0.2]
dtype: object

Answer 2

通过Series.dropna删除缺失值，并在字典理解中转换为Series：

s = pd.Series({x: df[x].dropna().tolist() for x in df.columns})
print (s)
Netflix                   [0.1, 0.12, 0.4]
TV         [0.2, 0.5, 0.6, 0.5, 0.41, 0.2]
DVD                 [0.3, 0.15, 0.8, 0.41]
dtype: object

...或DataFrame.apply中：

s = df.apply(lambda x: x.dropna().tolist())
print (s)

Netflix                   [0.1, 0.12, 0.4]
TV         [0.2, 0.5, 0.6, 0.5, 0.41, 0.2]
DVD                 [0.3, 0.15, 0.8, 0.41]
dtype: object

最后需要两列DataFrame：

df1 = s.rename_axis('a').reset_index(name='b')
print (df1)
         a                                b
0  Netflix                 [0.1, 0.12, 0.4]
1       TV  [0.2, 0.5, 0.6, 0.5, 0.41, 0.2]
2      DVD           [0.3, 0.15, 0.8, 0.41]

Answer 3

我认为这就是您想要的：

> df.T.apply(lambda x: x.dropna().tolist(), axis=1)

Netflix    [0.1, 0.12, 0.4, 0.5, 0.41, 0.2]
TV                    [0.2, 0.5, 0.6, 0.41]
DVD                        [0.3, 0.15, 0.8]
dtype: object

Answer 4

将groupby与columns一起使用

df.groupby(level=0,axis=1).apply(lambda x : x.dropna().iloc[:,0].tolist())
Out[20]: 
DVD                 [0.3, 0.15, 0.8, 0.41]
Netflix                   [0.1, 0.12, 0.4]
TV         [0.2, 0.5, 0.6, 0.5, 0.41, 0.2]
dtype: object

Answer 5

如果每列中的缺失值均为NaN，则可以使用以下方法：

df1 = pd.DataFrame({
    "Netflix":  [0.1, 0.12, 0.4, None, None, None],
    "TV":       [0.2, 0.5, 0.6, 0.5, 0.41, 0.2],
    "DVD":      [0.3, 0.15, 0.8, 0.41, None, None]
}
)
print(df1)

df2 = pd.DataFrame(df1.columns, columns=["Type"])
df2["List_for_Type"] = [
    list(df1[f].dropna())
    for f in df1.columns
]
print(df2)

对应的输出是：

  Netflix    TV   DVD
0     0.10  0.20  0.30
1     0.12  0.50  0.15
2     0.40  0.60  0.80
3      NaN  0.50  0.41
4      NaN  0.41   NaN
5      NaN  0.20   NaN

      Type                    List_for_Type
0  Netflix                 [0.1, 0.12, 0.4]
1       TV  [0.2, 0.5, 0.6, 0.5, 0.41, 0.2]
2      DVD           [0.3, 0.15, 0.8, 0.41]

希望这会有所帮助。

使用列名称重整从长到宽

5 个答案:

`stack`