将csv行(从for循环中)写入csv文件而不使用python csv模块

时间:2013-10-28 02:19:10

标签: python csv file-io

**我的目标是避免导入csv模块

我正在处理一个脚本,该脚本运行一个非常大的csv文件,并有选择地将行写入新的csv文件。

我有以下两行:

with open(sys.argv[1]) as ifile, open(sys.argv[2], mode = 'w') as ofile:
    for row in ifile: 

然后这个,一些嵌套的if语句:

line = list(ifile)[row]
ofile.write(line)

我知道这是不对的 - 我对它进行了一次尝试,希望有人能够对如何正确地解决这个问题有所了解。这个问题的本质是如何引用我所在的行,以便我可以使用'ofile'将其写入新的csv文件。如果有必要进一步澄清,请与我们联系。谢谢!

编辑:完整代码包含在pastebin链接中 - http://pastebin.com/a0jx85xR

2 个答案:

答案 0 :(得分:0)

你很亲密。这就是你要做的全部:

with open(sys.argv[1]) as ifile, open(sys.argv[2], mode = 'w') as ofile:
    for row in ifile:

    #...
    #You've defined some_condition to be met (you will have to replace this for yourself)
    #E.g.: the number of entries in each row is greater than 5:
        if len([term for term in row.split('#') if term.strip() != '']) > 5:
            ofile.write(row)

更新:

回答OP关于分割线的问题:

通过提供分隔符来在Python中分割一行。由于这是一个CSV文件,因此您可以按,拆分该行。例如:

如果这是一行(字符串):

0, 1, 2, 3, 4, 5

如果您申请:

line.split(',')

您将获得列表

['0', '1', '2', '3', '4', '5']

更新2:

import sys

if __name__ == '__main__':
    ticker = sys.argv[3]
    allTypes = bool(int(sys.argv[4])) #argv[4] is a string, you have to convert it to an int, then to a bool

    with open(sys.argv[1]) as ifile, open(sys.argv[2], mode = 'w') as ofile:
        all_timestamps = [] #this is an empty list
        n_rows = 0
        for row in ifile:
            #This splits the line into constituent terms as described earlier
            #SAMPLE LINE:
            #A,1,12884902522,B,B,4900,AAIR,0.1046,28800,390,B,AARCA,
            #After applying this bit of code, the line should be split into this:
            #['A', '1', '12884902522', 'B', 'B', '4900', 'AAIR', '0.1046', '28800', '390', 'B', 'AARCA']
            #NOW, you can make comparisons against those terms. :)

            terms = [term for term in row.split(',') if term.strip() != '']
            current_timestamp = int(terms[2])

            #compare the current against the previous
            #starting from row 2: (index 1)
            if n_rows > 1:
                #Python uses circular indices, hence: -1 means the value at the last index
                #That is, the previous time_stamp. Now perform the comparison and do something if that criterion is met:
                if current_timestamp - all_timestamp[-1] >= 0:
                    pass #the pass keyword means to do nothing. You'll have to replace it with whatever code you want

            #increment n_rows every time:
            n_rows += 1

            #always append the current timestamp to all the time_stamps
            all_timestamps.append(current_timestamp)


            if (terms[6] == ticker):
                # add something to make sure chronological order hasn't been broken
                if (allTypes == 1):
                    ofile.write(row)
            #I don't know if this was a bad indent of not, but you should know
            #where this goes
            elif (terms[0] == "A" or terms[0] == "M" or terms[0] == "D"):
                print row
                ofile.write(row)

我最初的推测是正确的。 您没有将行拆分为CSV组件。因此,当您对行进行比较时,您没有得到正确的结果 - 因此,您没有获得任何输出。这应该工作了(根据你的目标稍作修改)。 :)

答案 1 :(得分:0)

只是添加到jrd1的答案。我很少使用csv模块,我只是在字符串上使用split和join方法。通常我最终得到这样的东西(如果只有一个输入和输出,我通常只使用stdin和stdout)。

import sys as sys

for row in sys.stdin:
  fields = row.split(",") #Could be "\t" or whatever, default is whitespace

  #process fields in someway (0 based indexing)
  fields[0] = str(int(fields[0]) + 55) 
  fields[7] = new_date_format(fields[7])
  if(some_condition_is_met):
    print(",".join(fields))

当然,如果你的csv文件开始得到一些带引号和内部逗号等的时髦条目,那么这种方法将不会那么有趣

相关问题