有没有办法从带有值列表的字典创建数据框?

时间:2017-07-19 20:27:13

标签: python dictionary dataframe

我有一个字典myDict,我想使用此df创建一个数据框myDict,如下所示:

myDict = {
1: [''],
2: ['07/19/2017', ' 10/18/2007', '12/20/2002','12/20/2002' ],
3: ['07/19/2017', ' 10/18/2007'],
4: ['12/13/1993'],
5: [''],
6: ['08/01/2007'],
7: ['04/23/2007'],
8: ['02/06/2007'],
9: ['02/06/2007'],
10: ['11/08/2001'],
11: [''],
12: [''],
13: ['12/20/2002']
}

df
ID    Col1         Col2          Col3         Col4
1     
2     07/19/2017   10/18/2007    12/20/2002   12/20/2002 
3     07/19/2017   10/18/2007 
4     12/13/1993   
5     
6     08/01/2007
7     04/23/2007
8     02/06/2007
9     02/06/2007
10    11/08/2001
11    
12    
13    12/20/2002

我该如何做到这一点?谢谢。

将所有内容放入函数中是行不通的......

def split_Date(df):
    Dates1 = df.set_index('IDX')['Date'].to_dict()
    dates = {}
    for k, v in Dates1.items():
       v = v.split(',')
       dates[k] = [i for i in v]
    dates = {k: sorted(v, key=lambda x: datetime.strptime(x.strip(), "%m/%d/%Y") if x != "" else x) for k, v in dates.items()}
    df_dates = pd.DataFrame.from_dict(dates, orient="index").fillna('').rename_axis("IDX").rename(columns="Date{}".format).reset_index()
    df = pd.merge(df, df_dates, on='IDX', how='inner', suffixes=('_chem', '_df'))
    return df #Adding this doesn't make any difference

在函数外部运行此代码非常有效。但是,这需要我每次有新的myData时更改所有行中data的值。这不如具有功能

那么高效
Dates1 = myData.set_index('IDX')['Date'].to_dict()
dates = {}
for k, v in Dates1.items():
    v = v.split(',')
    dates[k] = [i for i in v]
dates = {k: sorted(v, key=lambda x: datetime.strptime(x.strip(), "%m/%d/%Y") 
if x != "" else x) for k, v in dates.items()}
df_dates = pd.DataFrame.from_dict(dates, orient="index").fillna('').rename_axis("IDX").rename(columns="Date{}".format).reset_index()
myData = pd.merge(myData, df_dates, on='IDX', how='inner', suffixes=('_chem', '_df'))

1 个答案:

答案 0 :(得分:4)

您可以使用 pd.DataFrame.from_dict 阅读它,并通过 orient 参数将键设置为索引:

pd.DataFrame.from_dict(myDict, orient="index").fillna('')

#            0           1           2           3
#1              
#2  07/19/2017  10/18/2007  12/20/2002  12/20/2002
#3  07/19/2017  10/18/2007      
#4  12/13/1993          
#5              
#6  08/01/2007          
# ...

要将密钥设置为单独的列,您可以使用reset_index

(pd.DataFrame.from_dict(myDict, orient="index")
 .fillna('')
 .rename_axis("ID")
 .rename(columns="Col{}".format)
 .reset_index())

#  ID         Col0        Col1        Col2        Col3
#0  1               
#1  2   07/19/2017  10/18/2007  12/20/2002  12/20/2002
#2  3   07/19/2017  10/18/2007      
#3  4   12/13/1993          
#4  5
# ...