熊猫0.24在gzip csv文件中写入额外的回车符

时间:2019-02-15 23:59:11

标签: python pandas

在Windows中,标准EOL(行尾)终止符是回车符,后跟换行符。在数据帧上使用to_csv方法时,这就是我得到的。但是,当我使用to_csv方法编写一个gzip压缩文件时,在该文件中得到了两个回车符。

1

以下是输出:

import pandas as pd, sys, gzip, zlib
print("python:", sys.version)
print("pandas:", pd.__version__)
print("zlib  :", zlib.ZLIB_RUNTIME_VERSION)
df=pd.DataFrame(data={'c0':['a','b'], 'c1':['c','d']})
print(df)
# Under Windows the EOL marker is \r\n, so this works as expected
df.to_csv('df.csv', index=None)
with open('df.csv', 'rb') as f:
    print("df.csv, default terminator   :", f.read())
# with gzip it writes \r\r\n as EOL, looks like a bug
df.to_csv('df.csv.gz', index=None)
with gzip.open('df.csv.gz', 'rb') as f:
    print("df.csv.gz, default terminator:", f.read())
# when specifying only a single '\n' that's what is written
df.to_csv('df.csv', index=None, line_terminator='\n')
with open('df.csv', 'rb') as f:
    print("df.csv, '\\n' terminator      :", f.read())
# when specifying only a single '\n' gzip it writes \r\n as EOL as desired
df.to_csv('df.csv.gz', index=None, line_terminator='\n')
with gzip.open('df.csv.gz', 'rb') as f:
    print("df.csv.gz, '\\n' terminator   :", f.read())

这显然与CSV in Python adding an extra carriage return, on Windows上先前讨论的问题有关。我的问题是,压缩文件与未压缩文件的行为不同。这是一个已知问题吗?

0 个答案:

没有答案