Question

现在我有这样的DF

 Word       Word2          Word3
 Hello      NaN            NaN
 My         My Name        NaN
 Yellow     Yellow Bee     Yellow Bee Hive
 Golden     Golden Gates   NaN
 Yellow     NaN            NaN

我希望从我的数据框中删除所有NaN细胞。所以最后，它看起来像这样，'Yellow Bee Hive'已移至第1行（类似于从excel中的列中删除单元格时发生的情况）：

   Word       Word2             Word3
1  Hello      My Name        Yellow Bee Hive
2  My         Yellow Bee       
3  Yellow     Golden Gates             
4  Golden       
5  Yellow

不幸的是，这些都不起作用，因为他们删除了整条行！

 df = df[pd.notnull(df['Word','Word2','Word3'])]

或

 df = df.dropna()

有人有什么建议吗？我应该重新索引表吗？

Answer 1

import numpy as np
import pandas as pd
import functools

def drop_and_roll(col, na_position='last', fillvalue=np.nan):
    result = np.full(len(col), fillvalue, dtype=col.dtype)
    mask = col.notnull()
    N = mask.sum()
    if na_position == 'last':
        result[:N] = col.loc[mask]
    elif na_position == 'first':
        result[-N:] = col.loc[mask]
    else:
        raise ValueError('na_position {!r} unrecognized'.format(na_position))
    return result

df = pd.read_table('data', sep='\s{2,}')

print(df.apply(functools.partial(drop_and_roll, fillvalue='')))

产量

     Word         Word2            Word3
0   Hello       My Name  Yellow Bee Hive
1      My    Yellow Bee                 
2  Yellow  Golden Gates                 
3  Golden                               
4  Yellow

Answer 2

由于您希望值向上移动，因此您必须创建新的数据框

开始 -

     Word         Word2
0   Hello           NaN
1      My       My Name
2  Yellow    Yellow Bee
3  Golden  Golden Gates
4  Yellow           NaN

使用以下方法 -

def get_column_array(df, column):
    expected_length = len(df)
    current_array = df[column].dropna().values
    if len(current_array) < expected_length:
        current_array = np.append(current_array, [''] * (expected_length - len(current_array)))
    return current_array

pd.DataFrame({column: get_column_array(df, column) for column in df.columns}

给予 -

     Word         Word2
0   Hello       My Name
1      My    Yellow Bee
2  Yellow  Golden Gates
3  Golden              
4  Yellow

您也可以使用相同的功能编辑现有的df -

for column in df.columns:
    df[column] = get_column_array(df, column)

Answer 3

我认为您可以使用此

df = df.apply(lambda x: pd.Series(x.dropna().values))

例如：

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Word':['Hello', 'My', 'Yellow', 'Golden', 'Yellow'],
    'Word2':[np.nan, 'My Name', 'Yellow Bee', 'Golden Gates', np.nan],
    'Word3':[np.nan, np.nan, 'Yellow Bee Hive', np.nan, np.nan]
})

print(df)

初始数据框：

     Word         Word2            Word3
0   Hello           NaN              NaN
1      My       My Name              NaN
2  Yellow    Yellow Bee  Yellow Bee Hive
3  Golden  Golden Gates              NaN
4  Yellow           NaN              NaN

并应用此lambda函数：

df = df.apply(lambda x: pd.Series(x.dropna().values))

print(df)

给予：

     Word         Word2            Word3
0   Hello       My Name  Yellow Bee Hive
1      My    Yellow Bee              NaN
2  Yellow  Golden Gates              NaN
3  Golden           NaN              NaN
4  Yellow           NaN              NaN

然后，您可以用空字符串填充NaN值：

df = df.fillna('')

print(df)

     Word         Word2            Word3
0   Hello       My Name  Yellow Bee Hive
1      My    Yellow Bee                 
2  Yellow  Golden Gates                 
3  Golden                               
4  Yellow

去除NaN＆＃39;细胞＆＃39;不丢弃整个ROW（Pandas，Python3）

3 个答案: