如果字符串包含在另一个字符串列中,则从列表中获取字符串

时间:2021-02-24 14:14:35

标签: python pandas

我有一个简单的字符串列和一个字符串列表。

strings_col
"the cat is on the table"
"the dog is eating"

list1 = ["cat", "table", "dog"]

我需要创建另一列,其中每一行都包含列表中包含的字符串,如果它们在 string_col 中,如果它包含列表中的两个或多个字符串,那么我想要更多行。结果应该是这样的:

 strings_col                   string
"the cat is on the table"      cat
"the cat is on the table"      table
"the dog is eating"            dog

我该怎么做? 谢谢

2 个答案:

答案 0 :(得分:3)

试试 str.extractall.groupby.agg(list).explode()

pat = '|'.join(list1)
# 'cat|table|dog'


df['matches'] = df['strings_col']\
                 .str.extractall(f"({pat})")\
                 .groupby(level=0).agg(list)

df_new = df.explode('matches')
print(df_new)
    
               strings_col matches
0  the cat is on the table     cat
0  the cat is on the table   table
1       the dog is eating      dog

答案 1 :(得分:3)

您可以使用str.findall

>>> df.assign(string=df.strings_col.str.findall(r'|'.join(list1))).explode('string')

                 strings_col string
0  "the cat is on the table"    cat
0  "the cat is on the table"  table
1        "the dog is eating"    dog

如果您愿意,可以在此之后reset_index

>>> df.assign(
        string=df.strings_col.str.findall(r'|'.join(list1))
    ).explode('string').reset_index(drop=True)
                 strings_col string
0  "the cat is on the table"    cat
1  "the cat is on the table"  table
2        "the dog is eating"    dog