我有一个列 A 的数据框,如下所示,我想创建一个名为“基于 A 列的复杂性”的新列。但是输出没有反映我想要的输出。有人可以帮忙吗?
A
dev DH
dev DHGP
dev SEA
dev MONO
dev SLIM DH
dev SLIM MONO
def complexity_column(df,classes):
conditions_region = [
(df[classes].str.contains("DH")),
(df[classes].str.contains("DHGP")),
(df[classes].str.contains("SEA")),
(df[classes].str.contains("MONO")),
(df[classes].str.contains("SLIM DH")),
(df[classes].str.contains("SLIM MONO"))
]
# create a list of the values we want to assign for each condition
values_regions = ['DH','CHGP', 'SEA','MONO','SLIM DH','SLIM MONO']
# create a new column and use np.select to assign values to it using our lists as arguments
df['COMPLEXITY'] = np.select(conditions_region, values_regions)
return df
输出
complexity_column(df,"A")
output:
A COMPLEXITY
dev DH DH
dev DHGP DH
dev SEA SEA
dev MONO MONO
dev SLIM DH DH
dev SLIM MONO MONO
我的愿望输出如下
A COMPLEXITY
dev DH DH
dev DHGP DHGP
dev SEA SEA
dev MONO MONO
dev SLIM DH SLIM DH
dev SLIM MONO SLIM MONO
答案 0 :(得分:1)
来自 numpy.select 的文档:numpy.select(condlist, choicelist, default=0)
condlist:条件列表,确定从选择列表中的哪个数组中获取输出元素。当满足多个条件时,使用 condlist 中遇到的第一个。
您需要对 conditions_region
中的元素重新排序,以确保更具体的条件首先出现,一般条件出现在最后。
也就是说,
conditions_region = [
df[classes].str.contains("SLIM DH"),
df[classes].str.contains("SLIM MONO"),
df[classes].str.contains("DHGP"),
df[classes].str.contains("DH"),
df[classes].str.contains("SEA"),
df[classes].str.contains("MONO")
]
答案 1 :(得分:0)
与其使用需要字符串子集的 .str.contains 不如使用 ==,即:
def complexity_column(df,classes):
conditions_region = [
(df[classes] == "DH"),
(df[classes] == "DHGP")),
(df[classes] == "SEA")),
(df[classes] == "MONO")),
(df[classes] == "SLIM DH")),
(df[classes] == "SLIM MONO"))
]
答案 2 :(得分:0)
def column_maker(entry_row,list_of_strings):
output_string = ''
for i in list_of_strings:
if i in entry_row:
output_string = output_string +" "+i
return output_string
df['complexity'] = df[column_name].apply(lambda x:column_maker(x,list_of_strings))