我有一个简单的字符串列和一个字符串列表。
strings_col
"the cat is on the table"
"the dog is eating"
list1 = ["cat", "table", "dog"]
我需要创建另一列,其中每一行都包含列表中包含的字符串,如果它们在 string_col 中,如果它包含列表中的两个或多个字符串,那么我想要更多行。结果应该是这样的:
strings_col string
"the cat is on the table" cat
"the cat is on the table" table
"the dog is eating" dog
我该怎么做? 谢谢
答案 0 :(得分:3)
试试 str.extractall
、.groupby.agg(list)
和 .explode()
pat = '|'.join(list1)
# 'cat|table|dog'
df['matches'] = df['strings_col']\
.str.extractall(f"({pat})")\
.groupby(level=0).agg(list)
df_new = df.explode('matches')
print(df_new)
strings_col matches
0 the cat is on the table cat
0 the cat is on the table table
1 the dog is eating dog
答案 1 :(得分:3)
您可以使用str.findall
:
>>> df.assign(string=df.strings_col.str.findall(r'|'.join(list1))).explode('string')
strings_col string
0 "the cat is on the table" cat
0 "the cat is on the table" table
1 "the dog is eating" dog
如果您愿意,可以在此之后reset_index
:
>>> df.assign(
string=df.strings_col.str.findall(r'|'.join(list1))
).explode('string').reset_index(drop=True)
strings_col string
0 "the cat is on the table" cat
1 "the cat is on the table" table
2 "the dog is eating" dog