Question

我正在使用数据集 Titanic。我想将数字列与类别列分开。我试着用这行代码来做到这一点：

from pandas.api.types import is_string_dtype
from pandas.api.types import is_numeric_dtype

print("Numeric columns")
for column in dataset.columns:
    if is_numeric_dtype(dataset[column]):
        print(column)
print("----------------------------------")
print("Category columns")
for column in dataset.columns:
    if is_string_dtype(dataset[column]):
        print(column)

输出：

Numeric
columns
Unnamed: 0
credit_amount
installment_commitment
residence_since
age
existing_credits
num_dependents
accepted
----------------------------------
Category
columns
checking_status
duration
credit_history
purpose
savings_status
employment
personal_status
other_parties
property_magnitude
other_payment_plans
housing
job
own_telephone
foreign_worker
change_purpose
change_duration

所以现在我清楚地看到了什么是数字类别。现在我想删除所有带有存储到 columns_names 中的名称列的数字列

dataset_numerical = dataset.select_dtypes(include=['int64'])
columns_names = dataset_numerical.tolist()
dataset = dataset.drop([columns_names], axis=1)

这被存储到columns_names

['Unnamed: 0',
 'credit_amount',
 'installment_commitment',
 'residence_since',
 'age',
 'existing_credits',
 'num_dependents',
 'accepted']

很明显我在最后一行代码中犯了错误，所以有人可以帮我解决这个问题吗？

我也尝试过这行代码，但还是没有

to_drop = columns_names
to_drop_stripped = [x.strip() for x in to_drop.split(',')]
dataset.drop(columns=to_drop_stripped)

最后，我希望删除所有名称存储到 columns_names 中的列。

Answer 1

需要对您的 2 块代码进行一些小的调整。很难确定这对您有用，因为我无法完全复制您的数据集，但我认为以下代码现在可以使用。

# Code block 1
dataset_numerical = dataset.select_dtypes(include = ['int64'])
columns_names = dataset_numerical.columns.tolist()             # added the .columns
dataset= dataset.drop(columns_names, axis=1)                   # removed the [] brackets

# Code block 2
to_drop = columns_names
to_drop_stripped = [x.strip() for x in to_drop]     # removed .split() at the end
dataset.drop(columns=to_drop_stripped)

Answer 2

 #Check the dtypes with
 dataset.dtypes

 #For a list of the columns with strings
 print(dataset.select_dtypes(include=object).columns.values)
 
 #Replace object with the dtype you are interested, without " "

从列表选择中删除数据框中的几列

2 个答案: