Question

我需要在（'/ dir'/）中获取csv文件的长度，不包括空行。我试过这个：

import os, csv, itertools, glob

#To filer the empty lines
def filterfalse(predicate, iterable):
    # filterfalse(lambda x: x%2, range(10)) --> 0 2 4 6 8
    if predicate is None:
        predicate = bool
    for x in iterable:
        if not predicate(x):
            yield x

#To read each file in '/dir/', compute the length and write the output 'count.csv'
with open('count.csv', 'w') as out:
     file_list = glob.glob('/dir/*')
     for file_name in file_list:
         with open(file_name, 'r') as f:
              filt_f1 = filterfalse(lambda line: line.startswith('\n'), f)
              count = sum(1 for line in f if (filt_f1))
              out.write('{c} {f}\n'.format(c = count, f = file_name))

我得到了我想要的输出，但遗憾的是每个文件的长度（在'/ dir /'中）包含空行。

要查看空行的来源，我将file.csv视为file.txt，它看起来像这样：

*text,favorited,favoriteCount,...
"Retweeted user (@user):...
'empty row'
Do Operators...*

Answer 1

我建议使用pandas。

DisplayMemberPath

文档：http://pandas.pydata.org/pandas-docs/version/0.18.0/

Answer 2

您的filterfalse()功能正常运行。它的完全与标准库itertools模块中名为ifilterfalse的完全相同，所以不清楚为什么你不这样做使用它而不是自己写 - 一个主要的优点是它已经过测试和调试。（内置函数通常也更快，因为许多内容都是用C语言编写的。）

问题是您没有正确使用generator function。

由于它返回generator object，因此需要使用yield之类的代码迭代多个值for line in filt_f1。

您提供的谓词函数参数不会处理其中包含其他前导空格字符的行，例如空格和制表符。 - 所以你传递的lambda也需要修改以处理这些情况。

以下代码对其进行了两次更改。

import os, csv, itertools, glob #To filter the empty lines def filterfalse(predicate, iterable): # filterfalse(lambda x: x%2, range(10)) --> 0 2 4 6 8 if predicate is None: predicate = bool for x in iterable: if not predicate(x): yield x #To read each file in '/dir/', compute the length and write the output 'count.csv' with open('count.csv', 'w') as out: file_list = glob.glob('/dir/*') for file_name in file_list: with open(file_name, 'r') as f: filt_f1 = filterfalse(lambda line: not line.strip(), f) # CHANGED count = sum(1 for line in filt_f1) # CHANGED out.write('{c} {f}\n'.format(c=count, f=file_name))

计算多个csv文件中的行，跳过空白行

2 个答案: