忽略pandas中的引号csv

时间:2017-05-23 18:17:21

标签: python pandas

我有这样的CSV:

A  B  C  D                     E   F  G
-- -- -- --------------------- --- -- --
G1 M1 C1 "2015-01-01 00:00:00" SR1 E1 N1
G1 M1 C1 "2015-01-01 00:00:00" SR1 E1 N2
G1 M1 C1 "2015-01-01 00:00:00" SR1 E1 N3
G2 M2 C1 "2015-01-01 00:00:00" SR1 E1 N1
G2 M2 C1 "1/1/2015 00:00:00" SR1 E1 N2
G2 M2 C1 "1/1/2015 00:00:00" SR1 E1 N3

我需要将其读入pandas df并忽略D列中的引号,以便我可以将其解析为日期时间列。我试图做以下事情:

df = pd.read_csv(
        infile,
        sep=r"\s*(?![0-9][0-9]:)",
        skiprows=[1],
        header=0,
        quoting=csv.QUOTE_NONE
    )

但是得到的df仍然有引号:

>>> df
    A   B   C                      D    E   F   G
0  G1  M1  C1  "2015-01-01 00:00:00"  SR1  E1  N1
1  G1  M1  C1  "2015-01-01 00:00:00"  SR1  E1  N2
2  G1  M1  C1  "2015-01-01 00:00:00"  SR1  E1  N3
3  G2  M2  C1  "2015-01-01 00:00:00"  SR1  E1  N1
4  G2  M2  C1    "1/1/2015 00:00:00"  SR1  E1  N2
5  G2  M2  C1    "1/1/2015 00:00:00"  SR1  E1  N3

如果我尝试直接将D列解析为日期时间列,则pandas会断开:

>>> pd.to_datetime(df.D)
...
ValueError: Unknown string format

如何让D列格式化,以便pandas可以将其解析为日期列?

熊猫版:0.19.2

1 个答案:

答案 0 :(得分:3)

演示:

In [44]: df = pd.read_csv(r'D:\download\1.csv', delim_whitespace=True, skiprows=[1], 
                          parse_dates=['D'])

In [45]: df
Out[45]:
    A   B   C          D    E   F   G
0  G1  M1  C1 2015-01-01  SR1  E1  N1
1  G1  M1  C1 2015-01-01  SR1  E1  N2
2  G1  M1  C1 2015-01-01  SR1  E1  N3
3  G2  M2  C1 2015-01-01  SR1  E1  N1
4  G2  M2  C1 2015-01-01  SR1  E1  N2
5  G2  M2  C1 2015-01-01  SR1  E1  N3

In [46]: df.dtypes
Out[46]:
A            object
B            object
C            object
D    datetime64[ns]
E            object
F            object
G            object
dtype: object

其中D:\download\1.csv

A  B  C  D                     E   F  G
-- -- -- --------------------- --- -- --
G1 M1 C1 "2015-01-01 00:00:00" SR1 E1 N1
G1 M1 C1 "2015-01-01 00:00:00" SR1 E1 N2
G1 M1 C1 "2015-01-01 00:00:00" SR1 E1 N3
G2 M2 C1 "2015-01-01 00:00:00" SR1 E1 N1
G2 M2 C1 "1/1/2015 00:00:00" SR1 E1 N2
G2 M2 C1 "1/1/2015 00:00:00" SR1 E1 N3