Question

我有一个名为pandas.DataFrame的{{1}}：

fake_num

我正在尝试使用线性回归填充fake_num=pd.DataFrame([[1,2,3,4,np.nan,np.nan,np.nan],[1.1,1.2,1.3,1.4,1.6,1.8,2.5]]).T fake_num Out[4]: 0 1 0 1.0 1.1 1 2.0 1.2 2 3.0 1.3 3 4.0 1.4 4 NaN 1.6 5 NaN 1.8 6 NaN 2.5值：

NaN

我要替换的部分是from sklearn.linear_model import LinearRegression fdrop=fake_num.dropna(axis=0,how='any') lr=LinearRegression() lr.fit(np.array(fdrop.iloc[:,1]).reshape(-1, 1),np.array(fdrop.iloc[:,0])) lr.predict(np.array(fake_num[np.isnan(fake_num[0])][1]).reshape(-1, 1)) Out[5]: array([ 6., 8., 15.])，所以我想要的是：

fake_num[np.isnan(fake_num[0])][0]

我尝试过：

    Out[6]: 
     0    1
0  1.0  1.1
1  2.0  1.2
2  3.0  1.3
3  4.0  1.4
4  6.0  1.6
5  8.0  1.8
6  5.0  2.5

和

fake_num[np.isnan(fake_num[0])][0]=lr.predict(np.array(fake_num[np.isnan(fake_num.iloc[:,0])].iloc[:,1]).reshape(-1, 1))
fake_num
__main__:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
Out[11]: 
     0    1
0  1.0  1.1
1  2.0  1.2
2  3.0  1.3
3  4.0  1.4
4  NaN  1.6
5  NaN  1.8
6  NaN  2.5

和

    fake_num[np.isnan(fake_num.loc[:,0])].loc[:,0]=lr.predict(np.array(fake_num[np.isnan(fake_num.iloc[:,0])].iloc[:,1]).reshape(-1, 1))
fake_num
D:\Users\shan xu\Anaconda3\lib\site-packages\pandas\core\indexing.py:630: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item_labels[indexer[info_axis]]] = value
Out[12]: 
     0    1
0  1.0  1.1
1  2.0  1.2
2  3.0  1.3
3  4.0  1.4
4  NaN  1.6
5  NaN  1.8

我应该怎么做才能用一些值替换数据框的位置。顺便说一句，由于我需要更多详细的细节说明，还有什么好的工具，可以使用其他所有非na行和其他列作为输入的简单预测模型来获取fill na值？像R中的missforest一样。

Answer 1

只需调用fit，然后使用loc进行分配即可。

v = fake_num.dropna()
lr.fit(v[[1]], v[[0]])

m = fake_num[0].isna()
fake_num.loc[m, [0]] = lr.predict(fake_num.loc[m, [1]])

fake_num
      0    1
0   1.0  1.1
1   2.0  1.2
2   3.0  1.3
3   4.0  1.4
4   6.0  1.6
5   8.0  1.8
6  15.0  2.5

用给定数组替换熊猫data.frame的一部分

1 个答案: