无法将列转换为日期时间

时间:2019-05-04 17:19:11

标签: python pandas datetime-format

我从这里尝试了许多建议,但没有一个得到解决。 我有两列这样的观察结果:15:08:19

如果我写

df.time_entry.describe() 

它出现:

count       814262
unique       56765
top       15:03:00
freq           103
Name: time_entry, dtype: object

我已经运行了以下代码:

df['time_entry'] = pd.to_datetime(df['time_entry'],format= '%H:%M:%S', errors='ignore' ).dt.time

但是重新运行描述代码仍然返回dtype: object

2 个答案:

答案 0 :(得分:1)

dt.time的目的是什么?

只需删除dt.time,从对象到日期时间的转换就可以正常工作。

df['time_entry'] = pd.to_datetime(df['time_entry'],format= '%H:%M:%S')

答案 1 :(得分:0)

问题是您正在使用具有属性.dt的日期时间访问器(time),然后又无法将两列相减。因此,只需省略.dt.time,它应该可以工作。

这里有一些包含两列字符串的数据

df = pd.DataFrame()
df['time_entry'] = ['12:01:00', '15:03:00', '16:43:00', '14:11:00']
df['time_entry2'] = ['13:03:00', '14:04:00', '19:23:00', '18:12:00']

print(df)
  time_entry time_entry2
0   12:01:00    13:03:00
1   15:03:00    14:04:00
2   16:43:00    19:23:00
3   14:11:00    18:12:00

将两列都转换为datetime dtype

df['time_entry'] = pd.to_datetime(df['time_entry'], format= '%H:%M:%S', errors='ignore')
df['time_entry2'] = pd.to_datetime(df['time_entry2'], format= '%H:%M:%S', errors='ignore')

print(df)
           time_entry         time_entry2
0 1900-01-01 12:01:00 1900-01-01 13:03:00
1 1900-01-01 15:03:00 1900-01-01 14:04:00
2 1900-01-01 16:43:00 1900-01-01 19:23:00
3 1900-01-01 14:11:00 1900-01-01 18:12:00

print(df.dtypes)
time_entry     datetime64[ns]
time_entry2    datetime64[ns]
dtype: object

(可选)Specify timezone

df['time_entry'] = df['time_entry'].dt.tz_localize('US/Central')
df['time_entry2'] = df['time_entry2'].dt.tz_localize('US/Central')

现在执行两列之间的时差(减法)并获得天数的时差(以浮点数表示)

df['Diff_days1'] = (df['time_entry'] - df['time_entry2']).dt.total_seconds()/60/60/24
df['Diff_days2'] = (df['time_entry'] - df['time_entry2']) / np.timedelta64(1, 'D')
df['Diff_days3'] = (df['time_entry'].sub(df['time_entry2'])).dt.total_seconds()/60/60/24

print(df)
           time_entry         time_entry2  Diff_days1  Diff_days2  Diff_days3
0 1900-01-01 12:01:00 1900-01-01 13:03:00   -0.043056   -0.043056   -0.043056
1 1900-01-01 15:03:00 1900-01-01 14:04:00    0.040972    0.040972    0.040972
2 1900-01-01 16:43:00 1900-01-01 19:23:00   -0.111111   -0.111111   -0.111111
3 1900-01-01 14:11:00 1900-01-01 18:12:00   -0.167361   -0.167361   -0.167361

编辑

如果您尝试访问datetime属性,则可以直接使用time_entry列(而不是时差列)来进行访问。这是一个例子

df['day1'] = df['time_entry'].dt.day
df['time1'] = df['time_entry'].dt.time
df['minute1'] = df['time_entry'].dt.minute
df['dayofweek1'] = df['time_entry'].dt.weekday
df['day2'] = df['time_entry2'].dt.day
df['time2'] = df['time_entry2'].dt.time
df['minute2'] = df['time_entry2'].dt.minute
df['dayofweek2'] = df['time_entry2'].dt.weekday

print(df[['day1', 'time1', 'minute1', 'dayofweek1',
        'day2', 'time2', 'minute2', 'dayofweek2']])
   day1     time1  minute1  dayofweek1  day2     time2  minute2  dayofweek2
0     1  12:01:00        1           0     1  13:03:00        3           0
1     1  15:03:00        3           0     1  14:04:00        4           0
2     1  16:43:00       43           0     1  19:23:00       23           0
3     1  14:11:00       11           0     1  18:12:00       12           0