在Pandas中将列表元素与子列表元素进行比较

时间:2020-09-11 20:19:39

标签: python pandas

df

col1                       col2
['aa', 'bb', 'cc', 'dd']   [['ee', 'ff', 'gg', 'hh'], ['qq', 'ww', 'ee', 'rr']]
['ss', 'dd', 'ff', 'gg']   [['mm', 'nn', 'vv', 'cc'], ['zz', 'aa', 'jj', 'kk']]
['ss', 'dd']               [['mm', 'nn', 'vv', 'cc'], ['zz', 'aa', 'jj', 'kk']]

我希望能够运行一个将col1中的第一个列表元素连接到col2中的第一个子列表 elements (有多个子列表)的函数,然后将col1中的第二个列表元素连接到col2中的第二个子列表元素。

结果将类似于此列:

results
[['aaee', 'bbff', 'ccgg', 'ddhh'],['aaqq', 'bbww', 'ccee', 'ddrr']]
[['ssmm', 'ddnn', 'ffvv', 'ggcc'],['sszz', 'ddaa', 'ffjj', 'ggkk']]
[['ssmm', 'ddnn'],['sszz', 'ddaa']]

我认为这可能与循环遍历col1中的第一个元素并以某种方式循环并将它们与col2中每个子列表中的对应项相匹配-我该怎么做?


转换后的代码

[[[df1.agg(lambda x: get_top_matches(u,w), axis=1) for u,w in zip(x,v)]\
for v in y] for x,y in zip(df1['parent_org_name_list'], df1['children_org_name_sublists'])]

结果: enter image description here

3 个答案:

答案 0 :(得分:3)

您可以在此处使用zip

[[[u+w for u,w in zip(x,v)] for v in y] for x,y in zip(df['col1'], df['col2'])]

输出:

[[['aaee', 'bbff', 'ccgg', 'ddhh'], ['aaqq', 'bbww', 'ccee', 'ddrr']],
 [['ssmm', 'ddnn', 'ffvv', 'ggcc'], ['sszz', 'ddaa', 'ffjj', 'ggkk']],
 [['ssmm', 'ddnn'], ['sszz', 'ddaa']]]

要分配回您的数据框,您可以执行以下操作:

df['results'] = [[[u+w for u,w in zip(x,v)] for v in y] 
            for x,y in zip(df['col1'], df['col2'])]

答案 1 :(得分:1)

最大,请尝试使用此解决方案。它允许对转换进行更好的控制,包括处理不均匀的长度(请参见示例中的len_limit

import pandas as pd
df = pd.DataFrame({'c1':[['aa', 'bb', 'cc', 'dd'],['ss', 'dd', 'ff', 'gg']],
                   'c2':[[['ee', 'ff', 'gg', 'hh'], ['qq', 'ww', 'ee', 'rr']],
                         [['mm', 'nn', 'vv', 'cc'], ['zz', 'aa', 'jj', 'kk']]],})  

df ['c3'] = 'empty'  # send string to 'c3' so it is object data type
print(df)
                 c1                                    c2     c3
0  [aa, bb, cc, dd]  [[ee, ff, gg, hh], [qq, ww, ee, rr]]  empty
1  [ss, dd, ff, gg]  [[mm, nn, vv, cc], [zz, aa, jj, kk]]  empty

for i, row  in df.iterrows():
    c3_list = []
    len_limit = len (row['c1']
    for c2_sublist in row['c2']:
        c3_list.append([j1+j2 for j1, j2 in zip(row['c1'], c2_sublist[:len_limit])])
    df.at[i, 'c3'] = c3_list
    
print (df['c3'])

0    [[aaee, bbff, ccgg, ddhh], [aaqq, bbww, ccee, ...
1    [[ssmm, ddnn, ffvv, ggcc], [sszz, ddaa, ffjj, ...
Name: c3, dtype: object

答案 2 :(得分:1)

尝试:

df["results"] = df[["col1", "col2"]].apply(lambda x: [list(map(''.join, zip(x["col1"], el))) for el in x["col2"]], axis=1)

输出:

>>> df["results"]

0    [[aaee, bbff, ccgg, ddhh], [aaqq, bbww, ccee, ...
1    [[ssmm, ddnn, ffvv, ggcc], [sszz, ddaa, ffjj, ...
2                         [[ssmm, ddnn], [sszz, ddaa]]
相关问题