多处理我的循环/迭代(尝试...除外)

时间:2019-06-19 08:13:05

标签: python pandas python-multiprocessing

Im将一种化学符号转换为另一种类型。我的列表中有超过6k个不同的名称要转换,而且需要很长时间。如何使用多重处理?我试图实现自己,但我是菜鸟。也欢迎其他代码优化!

我试图自己实现多处理,但我是菜鸟。

def resolve(str_input, representation):
    import cirpy
    return cirpy.resolve(str_input, representation)

compound_list = []
smiles_list = []

for index, row in df_Verteilung.iterrows():

    try:
        actual_smiles = resolve(row['Compound'], 'smiles')

    except:
        actual_smiles = 'Error'

    print('\r', row['Compound'], actual_smiles, end='')

    compound_list.append(row['Compound'])
    smiles_list.append(actual_smiles)

df_new = pd.DataFrame({'Compound' : compound_list, 'SmilesCode' : smiles_list})
df_new.to_csv(index=False)

1 个答案:

答案 0 :(得分:0)

尝试通过多处理使用池:

    from multiprocessing import Pool

    def resolve(str_input, representation):
        try:
            import cirpy
            res =  cirpy.resolve(str_input, representation)
        except:
            res = "Error"

        print('\r', str_input, res, end='')
        return (str_input, res)

    n = 5

    with Pool(processes=n) as pool:
        compounds_smiles_list = pool.starmap(resolve, [(row['Compound'], 'smiles') for index, row in df_Verteilung.iterrows()])

    compound_list = [elem[0] for elem in compounds_smiles_list]
    smiles_list = [elem[1] for elem in compounds_smiles_list]

    df_new = pd.DataFrame({'Compound' : compound_list, 'SmilesCode' : smiles_list})
    df_new.to_csv(index=False)

使用变量n,您可以控制池的大小。另外,您可以将Pool构造函数保留为空,然后根据您的系统选择最佳数量的工作程序。

一些解释:

Pool

starmap