Question

此代码适用于单个字符串（inputx），但是当我将其替换为数据帧中列的名称时，我无法使其正常工作。我想做的是将DESC列中的字符串拆分，其中大写单词（在字符串的开头）放置在break2列中，其余的描述放置在break3列中。任何帮助表示赞赏。谢谢。

示例： What I want output to look like (but with the different DESC from each row

适用于硬编码字符串的代码：

inputx= "STOCK RECORD INQUIRY This is a system that keeps track of the positions, location and ownership of the securities that the broker holds"
pos = re.search("[a-z]", inputx[::1]).start()
Before_df['break1'] = pos
Before_df['break2'] = inputx[:(pos-1)]
Before_df['break3'] = inputx[(pos-1):]

但是如果我用dataframe列替换，则会收到错误消息： TypeError：预期的字符串或类似字节的对象

inputx = Before_df['DESC']
pos = re.search("[a-z]", inputx[::1]).start()
Before_df['break1'] = pos
Before_df['break2'] = inputx[:(pos-1)]
Before_df['break3'] = inputx[(pos-1):]

Answer 1

您可以在df.str.split方法中使用正则表达式

df[['result','result2','result3']] = df['yourcol'].str.split("([a-z])", expand= True)

如果您绝对必须使用re.search（听起来有点像作业……）

for i in df.index:
    df.at[i, 'columnName'] = re.search("[a-z]", df.at[i, 'inputColumn'][::1]).start()

循环而不使用df.apply()的原因是因为数据帧不喜欢在应用过程中更改

如何使用数据框中的re.search列

1 个答案: