如何将日期时间列格式化为相同格式

时间:2020-02-13 11:21:28

标签: python-3.x pandas datetime

我有一个日期格式为%Y-%m-%d %H:%M:%S%Y-%m-%d的数据框:

...
87986    1979-06-18 00:00:00
87987    1979-06-18 00:00:00
87988             1987-03-18
87989             1983-11-01
...

我想用相同的方式格式化它们。我尝试过:

df['birthdate']=pd.to_datetime(df['birthdate'].astype(str), format='%Y-%m-%d')

但是回来了:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\arrays\datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   1860         try:
-> 1861             values, tz_parsed = conversion.datetime_to_datetime64(data)
   1862             # If tzaware, these values represent unix timestamps, so we

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.datetime_to_datetime64()

TypeError: Unrecognized value type: <class 'str'>

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-345-d5c1036edc26> in <module>
     10         return born
     11 
---> 12 df['birthdate']=pd.to_datetime(df['birthdate'].astype(str), format='%Y-%m-%d')
     13 df["age"] = df["birthdate"].apply(calculate_age)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, box, format, exact, unit, infer_datetime_format, origin, cache)
    590         else:
    591             from pandas import Series
--> 592             values = convert_listlike(arg._values, True, format)
    593             result = Series(values, index=arg.index, name=arg.name)
    594     elif isinstance(arg, (ABCDataFrame, compat.MutableMapping)):

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike_datetimes(arg, box, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
    300             arg, dayfirst=dayfirst, yearfirst=yearfirst,
    301             utc=utc, errors=errors, require_iso8601=require_iso8601,
--> 302             allow_object=True)
    303 
    304     if tz_parsed is not None:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\arrays\datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   1864             return values.view('i8'), tz_parsed
   1865         except (ValueError, TypeError):
-> 1866             raise e
   1867 
   1868     if tz_parsed is not None:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\arrays\datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   1855             dayfirst=dayfirst,
   1856             yearfirst=yearfirst,
-> 1857             require_iso8601=require_iso8601
   1858         )
   1859     except ValueError as e:

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()

ValueError: time data None doesn't match format specified

但是,我的行都不是None

然后我尝试:

df['birthdate']=pd.to_datetime(df['birthdate']).dt.strftime('%Y-%m-%d')

但是得到了:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\arrays\datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   1860         try:
-> 1861             values, tz_parsed = conversion.datetime_to_datetime64(data)
   1862             # If tzaware, these values represent unix timestamps, so we

pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.datetime_to_datetime64()

TypeError: Unrecognized value type: <class 'str'>

During handling of the above exception, another exception occurred:

OutOfBoundsDatetime                       Traceback (most recent call last)
<ipython-input-350-b23ba7455ab9> in <module>
----> 1 df['birthdate']=pd.to_datetime(df['birthdate']).dt.strftime('%Y-%m-%d')

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, box, format, exact, unit, infer_datetime_format, origin, cache)
    590         else:
    591             from pandas import Series
--> 592             values = convert_listlike(arg._values, True, format)
    593             result = Series(values, index=arg.index, name=arg.name)
    594     elif isinstance(arg, (ABCDataFrame, compat.MutableMapping)):

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike_datetimes(arg, box, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
    300             arg, dayfirst=dayfirst, yearfirst=yearfirst,
    301             utc=utc, errors=errors, require_iso8601=require_iso8601,
--> 302             allow_object=True)
    303 
    304     if tz_parsed is not None:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\arrays\datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   1864             return values.view('i8'), tz_parsed
   1865         except (ValueError, TypeError):
-> 1866             raise e
   1867 
   1868     if tz_parsed is not None:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\arrays\datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
   1855             dayfirst=dayfirst,
   1856             yearfirst=yearfirst,
-> 1857             require_iso8601=require_iso8601
   1858         )
   1859     except ValueError as e:

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()

pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()

pandas/_libs/tslibs/np_datetime.pyx in pandas._libs.tslibs.np_datetime.check_dts_bounds()

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 7974-04-23 00:00:00

附录:calculate_age函数

这种格式的全部原因是为了能够计算出出生日期到现在之间的年龄/时间。我建立的功能是:

from datetime import datetime, date 

def calculate_age(born):
    today = date.today
    days_in_year = 365.2425
    if born not in [None, 'NaT']:
        age = int((date.today() - born.date()).days / days_in_year)
        return age
    else:
        return born

df['birthdate']=pd.to_datetime(df['birthdate'], errors='coerce')
df["age"] = df["birthdate"].apply(calculate_age)

1 个答案:

答案 0 :(得分:1)

我认为没有必要转换为字符串,如果两种不同的格式也不能指定格式,但是对于将不可解析的日期时间(如errors='coerce'转换为缺失值7974-04-23 00:00:00NaT是必需的:

print (df)
                 birthdate
87986  1979-06-18 00:00:00
87987  1979-06-18 00:00:00
87988           1987-03-18
87989           1983-11-01
87990  7974-04-23 00:00:00 <- added row

df['birthdate']=pd.to_datetime(df['birthdate'], errors='coerce')
print (df)
       birthdate
87986 1979-06-18
87987 1979-06-18
87988 1987-03-18
87989 1983-11-01
87990        NaT

然后最简单的是删除缺失值:

df["age"] = df["birthdate"].dropna().apply(calculate_age)

更改后的功能有什么

from datetime import datetime, date 

def calculate_age(born):
    today = date.today
    days_in_year = 365.2425
    if pd.notna(born):
        age = int((date.today() - born.date()).days / days_in_year)
        return age
    else:
        return born

df["age"] = df["birthdate"].apply(calculate_age)
print (df)
       birthdate   age
87986 1979-06-18  40.0
87987 1979-06-18  40.0
87988 1987-03-18  32.0
87989 1983-11-01  36.0
87990        NaT   NaN
相关问题