Question

我有一个如下所示的数据框：

id  points
a   [c,v,b,n]
b   []
c   [x,a]
....

和字典（我也把它作为数据帧）：

{'a': ['j','c'],
 'b': [p,r,q]
 'c': [n,k,l,x,a]
 ....}

我想搜索是否包含字典的键是数据帧的点，然后从字典点中删除字典中没有匹配的项目。预期输出：

id  points
a   [c]
b   []
c   [x,a]

我试过这个

for key,point in my_dict.items():
    if df['points'].str.contains(point).any()

但我得到TypeError: unhashable type: 'list'

我尝试将数据帧转换为字典，但搜索时间太长，因为我需要更多的循环。有关代码或数据结构改进的任何建议吗？

修改

数据的另一种表示形式：

id  points
a   [c,v,b,n]
b   []
c   [x,a]
....

和

points
j,c
p,r,q
n,k,l,x,a

Answer 1

您可以致电apply并将您的dict值转换为一个集合，可以将intersection转换为列表：

In [15]:
d={'a': ['j','c'],
 'b': ['p','r','q'],
 'c': ['n','k','l','x','a']}
d

Out[15]:
{'a': ['j', 'c'], 'b': ['p', 'r', 'q'], 'c': ['n', 'k', 'l', 'x', 'a']}

In [17]:
df['points'] = df.apply(lambda row: list(set(d[row['id']]).intersection(row['points'])), axis=1)
df

Out[17]:
  id  points
0  a     [c]
1  b      []
2  c  [a, x]

至于为什么会出现错误，你试图在作为dtype列表的Series上调用.str方法，它们不是字符串。

在包含列表的数据框中搜索值

1 个答案: