Question

我正在清理一些数据，并希望有条件地拆分其值用换行符分隔的列。（例如3t10 \ n5b12）这些数据存在于column_a或column_b中，另一个为NaN。（作为参考，列为qualification_a_group或qualification_b_group。一个人（一行）只能在一个人中。）

除了资格列之外，还有final和semi_final列（具有相同的数据类型）。我能够使用所附的代码拆分那些，但需要使用一个条件来选择非nan的资格列。我已经尝试了下面的第二个代码块，但这仅在column_a不为null时产生。

'''
# This works
final_split = combined['final'].str.split("\n", n=1, expand=True)
combined['final_tops'] = final_split[0]
combined['final_zones'] = final_split[1]
'''

'''
# This only works for when qualification_a != nan
q1_split = combined['qualification_a'].str.split("\n", n=1, expand=True)
q2_split = combined['qualification_b'].str.split("\n", n=1, expand=True)

combined['qualification_tops'] = q1_split[0].where(q1_split[0] != np.nan, 
other=q2_split[0])
combined['qualification_zones'] = q1_split[1].where(q1_split[0] != 
np.nan, other=q2_split[1])
'''

我认为这是由于该方法没有迭代每一行，并且与final和semi_final不同，我需要使用for循环来解析资格列。这是正确的还是我起初做错了？如果是，那么对前者来说，最有效/最有效的方法是什么？谢谢。

Answer 1

想通了！我用的是np.where而不是df.where，它像一种魅力。代码如下：

''''
combined['qualification_tops'] = np.where(q1_split[0].isnull(), q2_split[0], q1_split[0])
combined['qualification_zones'] = np.where(q1_split[0].isnull(), q2_split[1], q1_split[1])
''''

如何有效地应用条件字符串方法将数据框列一分为二？

1 个答案: