Question

我有一些代码可以帮助我预测一些缺失值。这是代码

from datawig import SimpleImputer
from datawig.utils import random_split
from sklearn.metrics import f1_score, classification_report
df_train, df_test = random_split(df, split_ratios=[0.8, 0.2])
# Initialize a SimpleImputer model
imputer = SimpleImputer(
input_columns=['SITUACION_DNI_A'],  # columns containing information about 
 the column we want to impute
output_column='EXTRANJERO_A',  # the column we'd like to impute values for
output_path='imputer_model'  # stores model data and metrics
)

# Fit an imputer model on the train data
imputer.fit(train_df=df_train, num_epochs=10)

# Impute missing values and return original dataframe with predictions
predictions = imputer.predict(df_test)

此后，我得到了一个新的数据框，该数据框的行数少于原始数据行，该如何将预测中获得的值插入到原始数据帧中，或者有一种方法可以对所有数据框而不是测试

Answer 1

如果两个数据框都有唯一的列或可以充当ID的名称，则此方法将起作用

df_test = df_test.set_index('unique_col')
df_test.fillna(predictions.set_index('unique_col'))

如果上述方法不起作用，则删除具有缺失值的行，并将不正确的预测附加到数据框。在以下链接中寻求帮助

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html

Delete rows if there are null values in a specific column in Pandas dataframe

使用imputer后如何将数据放入数据帧中？

1 个答案: