如何用唯一数据填充空列行?

时间:2017-01-22 09:50:59

标签: python pandas dataframe

我想填充没有随机值数据的列。

853                           None
854                   cheese empty
855                   cheese other
856                   yogurt empty
857                   yogurt other
858                   yogurt empty
859                   yogurt other
860                   butter empty
861                   butter other
862                           None
863                           None

想得到类似的东西:

853                           ASDFGHJAS
854                         cheese empty
855                         cheese other
856                         yogurt empty
857                         yogurt other
858                         yogurt empty
859                         yogurt other
860                         butter empty
861                         butter other
862                           DFGHJRTYT
863                           ERTYUIOIO
864                           TYUIOPPWE
865                           QWERTYUUI
866                           CBNMTYUIO

我试过做类似的事情:

df1 = df[['english_name']].fillna(''.join(choice(ascii_uppercase) for i in range(12)), axis=1)



853                          ASDFGHJAS
854                         cheese empty
855                         cheese other
856                         yogurt empty
857                         yogurt other
858                         yogurt empty
859                         yogurt other
860                         butter empty
861                         butter other
862                           ASDFGHJAS
863                           ASDFGHJAS
864                           ASDFGHJAS
865                           ASDFGHJAS
866                           ASDFGHJAS

问题我每行都得到相同的值,并且每行需要唯一的随机值。

3 个答案:

答案 0 :(得分:5)

lambda值使用applynan随机选择。

In [243]: df[['english_name']].apply(lambda x: x.fillna(''.join(choice(ascii_upper
     ...: case) for i in range(12))), axis=1)
Out[243]:
     english_name
853  BIZLLWLFGUSD
854  cheese empty
855  cheese other
856  yogurt empty
857  yogurt other
858  yogurt empty
859  yogurt other
860  butter empty
861  butter other
862  NMHDRQMTWZXF
863  EGPCZFWEDOFR

或者,使用随机名称预先创建一系列相同长度,然后使用df.name.fillna(s)

In [259]: s = pd.Series([''.join(choice(ascii_uppercase) for i in range(12)) for _
     ...:  in range(len(df))], index=df.index)

In [260]: df.english_name.fillna(s)
Out[260]:
853    BRFERJPGVDXP
854    cheese empty
855    cheese other
856    yogurt empty
857    yogurt other
858    yogurt empty
859    yogurt other
860    butter empty
861    butter other
862    NYYTRCSSCPWT
863    ZYBNJQIPIWEF
Name: english_name, dtype: object

答案 1 :(得分:1)

使用this answer,您可以定义一个函数来返回给定大小的随机字符串:

def random_string(N=9):
    return ''.join(random.SystemRandom().choice(string.ascii_uppercase) for _ in range(N))


df[['english_name']].apply(lambda x: x.fillna(random_string()),axis=1)

答案 2 :(得分:1)

具有多个列的数据帧的通用解决方案

df = pd.DataFrame([
        ['a', np.nan, 'b'],
        [np.nan, 'c', np.nan],
        ['d', np.nan, 'e'],
        [np.nan, 'f', np.nan]
    ])

     0    1    2
0    a  NaN    b
1  NaN    c  NaN
2    d  NaN    e
3  NaN    f  NaN
  • 堆叠df以获得系列
  • count nulls
dfs = df.stack(dropna=False)
wherenull = dfs.isnull().values
n = wherenull.sum()

生成填充值

np.random.seed([3,1415])
fills = pd.DataFrame(
    np.random.choice(
        list(ascii_uppercase),
        (n, 12)
    )).sum(1).values

填补缺失

dfs.loc[wherenull] = fills
dfs.unstack()

              0             1             2
0             a  QLCKPXNLNTIX             b
1  AWYMWACAUZHT             c  NSMEDTNWHXNU
2             d  FDXFZLYHMGEH             e
3  WSOGGOVSIXKF             f  PYEPNHGRMMPO