删除或保留csv文件中的特定列

时间:2014-12-07 12:28:59

标签: python python-2.7 csv

我有一个简单的脚本要么从csv文件中删除最后n列,要么只在csv文件中保留前n列:

from sys import argv
import csv

if len(argv) == 4:
  script, inputFile, outputFile, n = argv
  n = [int(i) for i in n.split(",")]
else:
  script, inputFile, outputFile = argv
  n = 1

with open(inputFile,"r") as fin:
  with open(outputFile,"w") as fout:
    writer=csv.writer(fout)
    for row in csv.reader(fin):
      writer.writerow(row[:n])

示例用法(删除最后两列):removeKeepColumns.py sample.txt out.txt -2

如何扩展它以处理保留/删除特定列集的可能性,例如:

  • 删除第3,4,5列
  • 仅保留列,1,4,6

我可以将用逗号分隔的输入参数拆分为数组,但是不知道将其传递给writerow(row[])

我用来创建示例的脚本的链接:

2 个答案:

答案 0 :(得分:4)

已经有一个已接受的答案,这是我的解决方案:

>>> import pyexcel as pe
>>> sheet = pe.get_sheet(file_name="your_file.csv")
>>> sheet.column.select([1,4,5]) # the column indices to keep
>>> sheet.save_as("your_filtered_file.csv")
>>> exit()

以下是filtering

的详细信息

答案 1 :(得分:1)

阐述我的评论(Picking out items from a python list which have specific indexes

from sys import argv
import csv

if len(argv) == 4:
  script, inputFile, outputFile, cols_str = argv
  cols = [int(i) for i in cols_str.split(",")]

with open(inputFile,"r") as fin:
  with open(outputFile,"w") as fout:
    writer=csv.writer(fout)
    for row in csv.reader(fin):
      sublist = [row[x] for x in cols]
      writer.writerow(sublist)

这应该(未经测试)保留在第3个参数中以逗号分隔列表给出的所有列。要删除给定的列,

sublist = [row[x] for x not in cols]

应该这样做。