如何使用数据框中的re.search列

时间:2018-08-30 14:51:22

标签: python regex dataframe

此代码适用于单个字符串(inputx),但是当我将其替换为数据帧中列的名称时,我无法使其正常工作。我想做的是将DESC列中的字符串拆分,其中大写单词(在字符串的开头)放置在break2列中,其余的描述放置在break3列中。任何帮助表示赞赏。谢谢。

示例: What I want output to look like (but with the different DESC from each row

适用于硬编码字符串的代码:

inputx= "STOCK RECORD INQUIRY This is a system that keeps track of the positions, location and ownership of the securities that the broker holds"
pos = re.search("[a-z]", inputx[::1]).start()
Before_df['break1'] = pos
Before_df['break2'] = inputx[:(pos-1)]
Before_df['break3'] = inputx[(pos-1):]

但是如果我用dataframe列替换,则会收到错误消息: TypeError:预期的字符串或类似字节的对象

inputx = Before_df['DESC']
pos = re.search("[a-z]", inputx[::1]).start()
Before_df['break1'] = pos
Before_df['break2'] = inputx[:(pos-1)]
Before_df['break3'] = inputx[(pos-1):]

1 个答案:

答案 0 :(得分:0)

您可以在df.str.split方法中使用正则表达式

df[['result','result2','result3']] = df['yourcol'].str.split("([a-z])", expand= True)

如果您绝对必须使用re.search(听起来有点像作业……)

for i in df.index:
    df.at[i, 'columnName'] = re.search("[a-z]", df.at[i, 'inputColumn'][::1]).start()

循环而不使用df.apply()的原因是因为数据帧不喜欢在应用过程中更改