如何最有效地拆分Pandas列中表示为字符串的日期?

时间:2017-12-22 09:01:09

标签: python pandas

我的Pandas数据框中有一个名为start_date的列,字符串格式为:

start_date

'20120212'

'20120514'

'20121124'

'20120604'

要提取和创建月,年和日的单独列,这就是我目前正在做的事情。是否有更好的方法来做同样的事情?

df['start_month']=df['start_date'].apply(lambda x:str(x)[4:6])

df['start_year']=df['start_date'].apply(lambda x:str(x)[0:4])

df['start_day']=df['start_date'].apply(lambda x:str(x)[6:8])

1 个答案:

答案 0 :(得分:3)

使用to_datetime,然后提取年,月和日:

 select a.cat_id,
 a.cat_desc,
 b.cat_desc,
 group_concat(c.cat_desc order by c.cat_id asc) 
 from category a 
 left JOIN category b on (a.parent_category=b.cat_id) 
 left JOIN category c on find_in_set(c.cat_id,a.par_cat_order) 
 GROUP by a.cat_id

a = pd.to_datetime(df['start_date'], format='%Y%m%d') df['start_month'] = a.dt.month df['start_year'] = a.dt.year df['start_day'] = a.dt.day 切片并投放到str[]

int

比较解决方案:

df['start_date'] = df['start_date'].astype(str)
df['start_month'] = df['start_date'].str[4:6].astype(int)
df['start_year']=df['start_date'].str[:4].astype(int)
df['start_day']=df['start_date'].str[6:8].astype(int)
print (df)
  start_date  start_month  start_year  start_day
0   20120212            2        2012         12
1   20120514            5        2012         14
2   20121124           11        2012         24
3   20120604            6        2012          4
[40000 rows x 1 columns]
df = pd.concat([df]*10000).reset_index(drop=True)

def orig(df):
    df['start_month']=df['start_date'].apply(lambda x:str(x)[4:6]).astype(int)
    df['start_year']=df['start_date'].apply(lambda x:str(x)[0:4]).astype(int)
    df['start_day']=df['start_date'].apply(lambda x:str(x)[6:8]).astype(int)
    return df

def a(df):
    a = pd.to_datetime(df['start_date'], format='%Y%m%d')
    df['start_month'] = a.dt.month
    df['start_year'] = a.dt.year
    df['start_day'] = a.dt.day
    return df

def b(df):
    df['start_month'] = df['start_date'].str[4:6].astype(int)
    df['start_year']=df['start_date'].str[:4].astype(int)
    df['start_day']=df['start_date'].str[6:8].astype(int)
    return df