Question

我是Python的新手，我真的希望您能为我解决问题。

我给出了两个数据框：

d1（多个列表）

d1 = pd.DataFrame({
'city': ['New York', 'Chicago', 'Los Angeles','Washington, D.C.', 'San Francisco',
'New York City', 'Francisco', 'Washington', 'Dallas', 'Miami',  'Boston'],
'value':[ 10 , 5 , 7 , 8 , 9 , 10 , 9 , 8 , 10 , 4 , 3 ]})

d2（“主”-DataFrame ..项目仅列出一次）

d2 = pd.DataFrame({
'city': ['New York City', 'Los Angeles','Washington', 'San Francisco', 'Dallas'],
'value':[ 10 , 7 , 8 , 9 , 10 ]})

我现在想做什么：

从d1获取所有列表
搜索d2中是否存在“ d1.value”
- 对于每个发现：
- 比较城市名称（不是1：1，而是像“包含”），因此：如果d2.city中的d1.city或d1.city中的d2.city
- 匹配时：在d1的新列“ MatchWithD2”中设置标志

我已经有了apply的解决方案，但这花费了很多时间（对于d1和d2的较大版本）：

def add_flag(x):
dfRet= d2.apply(lambda y: y['city']
       if ((x['value'] == y['value']
            and x['city'] in y['city'] or y['city'] in x['city']))
       else None, axis=1)
dfRet= dfRet.dropna(axis=0,how='all')
if dfRet.empty:
    dfRet=np.nan
else: 
    dfRet= dfRet.to_string(index=False) 
return dfRet

d1['MatchWithD2'] = d1.apply(add_flag, axis =1)

您能帮我解决这个问题吗？

非常感谢您！

最诚挚的问候！

使用python中的多个数据框优化Apply函数

0 个答案: