为什么read_csv()解析我的日期?

时间:2014-01-12 21:34:37

标签: python pandas

我正在跑熊猫0.7.0。我有一个文件,其中包含以下行:

2014-01-12T00:00:00+00:00, 0.210079

当我用

阅读时
data = pd.read_csv('xx', names=["t", "p"], parse_dates=[0])

我最终将第一列作为字符串。为什么不将它们解析为日期时间?

print data.head()
                           t         p
0  2014-01-12T00:00:00+00:00  0.210079
1  2014-01-12T00:00:00+00:00  0.078217
2  2014-01-12T00:00:00+00:00  0.342977
3  2014-01-12T00:00:00+00:00  0.346713
4  2014-01-12T00:00:00+00:00  0.224601

1 个答案:

答案 0 :(得分:0)

正确解析

>>> import pandas as pd
>>> data = pd.read_csv("data.csv", names=["t", "p"], parse_dates=[0])
>>> data.to_dict()
{'p': {0: 0.21007899999999999}, 't': {0: Timestamp('2014-01-12 00:00:00', tz=None)}}
>>>

您会注意到您在数据中提到的日期(我根据您的问题使用了示例)是一个Timestamp obhect,它似乎代表了解析的数据

<强>更新

在使用dir()对Python shell进行更多调查之后,我发现更多证据表明它被正确解析为日期/时间对象(在这种情况下为Timestamp):

>>> data.to_dict()["t"][0]
Timestamp('2014-01-12 00:00:00', tz=None)
>>> dir(data.to_dict()["t"][0])
['__add__', '__class__', '__delattr__', '__dict__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__pyx_vtable__', '__qualname__', '__radd__', '__reduce__', '__reduce_ex__', '__repr__', '__rsub__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__weakref__', '_get_field', '_repr_base', 'asm8', 'astimezone', 'combine', 'ctime', 'date', 'day', 'dayofweek', 'dayofyear', 'dst', 'freq', 'freqstr', 'fromordinal', 'fromtimestamp', 'hour', 'isocalendar', 'isoformat', 'isoweekday', 'max', 'microsecond', 'min', 'minute', 'month', 'nanosecond', 'now', 'offset', 'quarter', 'replace', 'resolution', 'second', 'strftime', 'strptime', 'time', 'timetuple', 'timetz', 'to_datetime', 'to_period', 'to_pydatetime', 'today', 'toordinal', 'tz', 'tz_convert', 'tz_localize', 'tzinfo', 'tzname', 'utcfromtimestamp', 'utcnow', 'utcoffset', 'utctimetuple', 'value', 'week', 'weekday', 'weekofyear', 'year']
>>> data.to_dict()["t"][0].timetuple()
time.struct_time(tm_year=2014, tm_mon=1, tm_mday=12, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=6, tm_yday=12, tm_isdst=-1)
>>>

顺便说一句......如果你print data.head()这是打印以屏蔽data.head()返回的对象的表示。这可能是您混淆的根源,但它只是一种表现形式。请参阅:http://docs.python.org/2/library/repr.html