Question

我的目标是将一列从df1转移到df2，同时创建bin。我有一个名为df1的数据框，其中包含3个数字变量。我想获取一个名为'tenure'的变量到df2并想创建bins，它将列值传输到df2但df2显示了一些缺失的值。请在下面找到代码：

df2=pd.cut(df1["tenure"] , bins=[0,20,60,80], labels=['low','medium','high'])

在创建df2之前，我检查了df1中是否缺少值。没有那么令人着迷的值，但是在创建垃圾箱之后，它会显示11个缺失值。

print(df2.isnull().sum())

以上代码显示11个缺失值

感谢Anyones的帮助。

Answer 1

我假设您在df1['tenure']中有一些不在(0,80]中的值，也许是零。请参见下面的示例：

df1 = pd.DataFrame({'tenure':[-1, 0, 12, 34, 78, 80, 85]})
print (pd.cut(df1["tenure"] , bins=[0,20,60,80], labels=['low','medium','high']))

0       NaN    # -1 is lower than 0 so result is null
1       NaN    # it was 0 but the segment is open on the lowest bound so 0 gives null
2       low
3    medium
4      high
5      high    # 80 is kept as the segment is closed on the right
6       NaN    # 85 is higher than 80 so result is null
Name: tenure, dtype: category
Categories (3, object): [low < medium < high]

现在，您可以在include_lowest=True中传递参数pd.cut来保持结果的左边界：

print (pd.cut(df1["tenure"] , bins=[0,20,60,80], labels=['low','medium','high'],
              include_lowest=True))

0       NaN
1       low  # now where the value was 0 you get low and not null
2       low
3    medium
4      high
5      high
6       NaN
Name: tenure, dtype: category
Categories (3, object): [low < medium < high]

所以最后，我认为如果您打印len(df1[(df1.tenure <= 0) | (df1.tenure > 80)])，您的数据将得到11，作为null中df2值的数目（这里是3）

使用python pandas cut函数创建垃圾箱时数据丢失

1 个答案: