熊猫迭代行:根据项目的列列表添加行

时间:2019-02-08 14:27:53

标签: python pandas bioinformatics

互联网上的好人!第一篇文章在这里,请保持友善。

我有一个用分号分隔的等位基因列表的DF:

         Epitope                            MHC alleles  
16         GASPAVSSL      HLA-A*02:01;HLA-A*24:02;HLA-B40;HLA-B57     
285  IREFMEKECPFIKPE                HLA-A2;HLA-A28;HLA-B8;HLA-B44   
286        VRNIMSPVM                HLA-A2;HLA-A28;HLA-B8;HLA-B44   
287        TVWFVPSIK  HLA-A*01:01;HLA-A*02:01;HLA-B57;HLA-B*46:01   

我想迭代其原始数据,并将每行乘以列表中元素的数量。然后,对于每个新创建的项目,将“ MHC等位基因”列的列表替换为其中的每一项。

到目前为止,我已经尝试过:

temp_DF=temp_DF[["Description","MHC alleles"]]


new_rowsDF = pd.DataFrame(columns=temp_DF.columns)
for index, row in temp_DF.iterrows():
    if ";" in row['MHC alleles']: ## find those rows with multiple alleles
        alleles = row['MHC alleles'].split(";")

          ## make a list with only valid alleles (containing *)
        single_allele= [hla for hla in alleles if "*" in hla] 
        if not single_allele: ##if list empty ignore
            continue

        for alle in single_allele:            
            row["MHC alleles"] = alle
            new_rowsDF.loc[index] = row
    else:
        row["MHC alleles"] = row['MHC alleles'] ## leave the ones that were already single alleles
        new_rowsDF.loc[index] = row



display(new_rowsDF)

感觉我走对了,但我无法保持在循环中创建的行。这将是我想要的输出:

         Epitope    MHC alleles  
16         GASPAVSSL  HLA-A*02:01
16         GASPAVSSL  HLA-A*24:02 
287        TVWFVPSIK  HLA-B*46:01
287        TVWFVPSIK  HLA-A*02:01
287        TVWFVPSIK  HLA-A*01:01

提前谢谢!

0 个答案:

没有答案
相关问题