如何在python中以CSV格式编写输出文件?

时间:2016-09-12 10:01:57

标签: python csv

我尝试将输出文件写为CSV文件但是得到错误或不是预期的结果。我也在使用Python 3.5.2和2.7。

在Python 3.5中获取错误:

wr.writerow(var)
TypeError: a bytes-like object is required, not 'str'

在Python 2.7中,我将所有列结果都放在一列中。

预期结果:
输出文件格式与输入文件格式相同。

代码:

import csv

f1 = open("input_1.csv", "r") 

resultFile = open("out.csv", "wb")
wr = csv.writer(resultFile, quotechar=',') 

def sort_duplicates(f1):
  for i in range(0, len(f1)):
      f1.insert(f1.index(f1[i])+1, f1[i])
      f1.pop(i+1)

for var in f1:
      #print (var)
      wr.writerow([var]) 

如果我使用resultFile = open("out.csv", "w"),我会在输出文件中添加一行。

如果我使用上面的代码,请额外添加一行和一行。

4 个答案:

答案 0 :(得分:3)

在Python 3上,csv 要求以文本模式打开文件,而不是二进制模式。从文件模式中删除b。你应该真的使用newline=''

resultFile = open("out.csv", "w", newline='')

更好的是,使用文件对象作为上下文管理器,以确保它自动关闭:

with open("input_1.csv", "r") as f1, \
     open("out.csv", "w", newline='') as resultFile:
    wr = csv.writer(resultFile, dialect='excel')
    for var in f1:
        wr.writerow([var.rstrip('\n')])

我还剥离来自f1的行(只是为了删除换行符)并将该行放入列表中; csv.writer.writerow想要一个包含列的序列,而不是一个字符串。

引用csv.writer() documentation

  

如果 csvfile 是文件对象,则应使用newline='' [1]打开它。 [...] 所有其他非字符串数据在写入之前都使用str()进行字符串化。

     

[1]如果未指定newline='',则引用字段中嵌入的换行符将无法正确解释,并且在写入时使用\r\n换行符的平台上将添加额外的\r 。指定newline=''应始终是安全的,因为csv模块会执行自己的(universal)换行处理。

答案 1 :(得分:2)

其他人已回答您在使用Python 3时应以文本模式打开输出文件,即

deerfield <- get_ida("01170000", "1990-10-01", "2007-09-30")

dplyr::glimpse(deerfield)
## Observations: 550,917
## Variables: 8
## $ site_no     <chr> "01170000", "01170000", "01170000", "01170000", "0117000...
## $ date_time   <time> 1990-10-01 00:15:00, 1990-10-01 00:30:00, 1990-10-01 00...
## $ tz_cd       <chr> "EDT", "EDT", "EDT", "EDT", "EDT", "EDT", "EDT", "EDT", ...
## $ dd          <dbl> 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,...
## $ accuracy_cd <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ value       <dbl> 146, 139, 135, 143, 154, 166, 171, 175, 171, 166, 162, 1...
## $ prec        <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,...
## $ remark      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...

head(deerfield)
## # A tibble: 6 x 8
##    site_no           date_time tz_cd    dd accuracy_cd value  prec remark
##      <chr>              <time> <chr> <dbl>       <dbl> <dbl> <dbl>  <chr>
## 1 01170000 1990-10-01 00:15:00   EDT     7           1   146     3   <NA>
## 2 01170000 1990-10-01 00:30:00   EDT     7           1   139     3   <NA>
## 3 01170000 1990-10-01 00:45:00   EDT     7           1   135     3   <NA>
## 4 01170000 1990-10-01 01:00:00   EDT     7           1   143     3   <NA>
## 5 01170000 1990-10-01 01:15:00   EDT     7           1   154     3   <NA>
## 6 01170000 1990-10-01 01:30:00   EDT     7           1   166     3   <NA>

cat(comment(deerfield))
# retrieved: 2016-09-12 05:32:34 CST
#
# Data for the following station is contained in this file
# ---------------------------------------------------------
#  USGS 01170000 DEERFIELD RIVER NEAR WEST DEERFIELD, MA
#
# This data file was retrieved from the USGS
# instantaneous data archive at
# http://ida.water.usgs.gov
#
# ---------------------WARNING---------------------
# The instantaneous data you have obtained from
# this automated U.S. Geological Survey database
# may or may not have been the basis for the published
# daily mean discharges for this station. Although
# automated filtering has been used to compare these
# data to the published daily mean values and to remove
# obviously bad data, there may still be significant
# error in individual values. Users are strongly
# encouraged to review all data carefully prior to use.
# These data are released on the condition that neither
# the USGS nor the United States Government may be held
# liable for any damages resulting from its use.
#
# This file consists of tab-separated columns of the
# following fields.
#
# column       column definition
# -----------  -----------------------------------------
# site_no      USGS site identification number
# date_time     date and time in format (YYYYMMDDhhmmss)
# tz_cd        time zone
# dd           internal USGS sensor designation (''data descriptor'')
# accuracy_cd  accuracy code
#                   0 - A daily mean discharge calculated from the instantaneous
#                       data on this day is 0.01 cubic feet per second
#                       or less and the published daily mean is zero.
#                   1 - A daily mean discharge calculated from the instantaneous
#                       data on this day matches the published daily mean
#                       within 1 percent.
#                   2 - A daily mean discharge calculated from the instantaneous
#                       data on this day matches the published daily mean
#                       from greater than 1 to 5 percent.
#                   3 - A daily mean discharge calculated from the instantaneous
#                       values on this day matches the published daily mean
#                       from greater than 5 to 10 percent.
#                   9 - The instantaneous value is considered correct by the
#                       collecting USGS Water Science Center. A published daily
#                       mean value does not exist and/or no comparison was made.
# value        discharge in cubic feet per second
# precision    digits of precision in the discharge
# remark       optional remark code
#                 Remark  Explanation
#                   <     Actual value is known to be less than reported value.
#                   >     Actual value is known to be greater than reported value.
#                   &     Value is affected by unspecified reasons.
#                   A     Value is affected by ice at the measurement site.
#                   B     Value is affected by backwater at the measurement site.
#                   e     Value has been estimated by USGS personnel.
#                   E     Value was computed from an estimated value.
#                   F     Value was modified due to automated filtering.
#                   K     Value is affected by instrument calibration drift.
#                   R     Rating is undefined for this value.
#
#

但您还需要解析传入的CSV数据。因为它是您的代码将输入CSV文件的每一行读取为单个字符串。然后,在不将该行拆分为其组成字段的情况下,它将字符串传递给CSV编写器。因此,with open('out.csv', 'w', newline='') as resultFile: ... 会将字符串视为序列,并将每个字符(包括任何终止的新行字符)输出为单独的字段。例如,如果您的输入CSV文件包含:

1,2,3,4

您的输出文件将按如下方式编写:

1,",",2,",",3,",",4,"
"

您应该将csv.writer循环更改为:

for

现在输入的CSV文件将被解析为字段,for row in csv.reader(f1): # process the row wr.writerow(row) 将包含字符串列表 - 每个字段一个。对于前面的示例,row将是:

row
['1', '2', '3', '4']

当该列表传递给for row in csv.reader(f1): print(row) 时,文件的输出将为:

1,2,3,4

将所有这些放在一起就可以得到这段代码:

csv.writer

答案 2 :(得分:0)

打开没有b模式的文件

b模式将文件打开为二进制文件

您可以将文件打开为

open_file = open("filename.csv", "w")

答案 3 :(得分:0)

您正在以正常读取模式打开输入文件,但输出文件以二进制模式打开,正确

resultFile = open("out.csv", "w")

如上所示,如果你更换&#34; wb&#34;用&#34; w&#34;它会起作用。