从文件

时间:2015-07-31 09:09:30

标签: python

我想将文件中的所有整数读入一个列表。所有数字由空格(一个或多个)或结束线字符(一个或多个)分隔。这样做最有效和/或最优雅的方法是什么?我有两个解决方案,但我不知道它们是否好。

  1. 检查数字:

    for line in open("foo.txt", "r"):
        for i in line.strip().split(' '):
            if i.isdigit():
                my_list.append(int(i))
    
  2. 处理例外:

    for line in open("foo.txt", "r"):
        for i in line:
            try:
                my_list.append(int(i))
            except ValueError:
                pass
    
  3. 示例数据:

    1   2     3
     4 56
        789         
    9          91 56   
    
     10 
    11 
    

8 个答案:

答案 0 :(得分:6)

执行此操作的有效方法是使用with语句进行少量更改来打开文件的第一种方法,示例 -

with open("foo.txt", "r") as f:
    for line in f:
        for i in line.split():
            if i.isdigit():
                my_list.append(int(i))

通过与其他方法的比较完成时间测试 -

功能 -

def func1():
    my_list = []
    for line in open("foo.txt", "r"):
        for i in line.strip().split(' '):
            if i.isdigit():
                my_list.append(int(i))
    return my_list

def func1_1():
    return [int(i) for line in open("foo.txt", "r") for i in line.strip().split(' ') if i.isdigit()]

def func1_3():
    my_list = []
    with open("foo.txt", "r") as f:
        for line in f:
            for i in line.split():
                if i.isdigit():
                    my_list.append(int(i))
    return my_list

def func2():            
    my_list = []            
    for line in open("foo.txt", "r"):
        for i in line.split():
            try:
                my_list.append(int(i))
            except ValueError:
                pass
    return my_list

def func3():
    my_list = []
    with open("foo.txt","r") as f:
        cf = csv.reader(f, delimiter=' ')
        for row in cf:
            my_list.extend([int(i) for i in row if i.isdigit()])
    return my_list

时间测试的结果 -

In [25]: timeit func1()
The slowest run took 4.70 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 204 µs per loop

In [26]: timeit func1_1()
The slowest run took 4.39 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 207 µs per loop

In [27]: timeit func1_3()
The slowest run took 5.46 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 191 µs per loop

In [28]: timeit func2()
The slowest run took 4.09 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 212 µs per loop

In [34]: timeit func3()
The slowest run took 4.38 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 202 µs per loop

鉴于将数据存储到列表中的方法,我认为上面的func1_3()最快(如时间所示)。

但鉴于此,如果您真的处理的是非常大的文件,那么最好使用生成器而不是将完整列表存储在内存中。

更新:正如评论中所说,func2()func1_3()更快(尽管在我的系统上它甚至比func1_3()更快仅限整数),更新foo.txt以包含除数字以外的其他内容并进行计时测试 -

foo.txt的

1 2 10 11
asd dd
 dds asda
22 44 32 11   23
dd dsa dds
21 12
12
33
45
dds
asdas
dasdasd dasd das d asda sda

测试 -

In [13]: %timeit func1_3()
The slowest run took 6.17 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 210 µs per loop

In [14]: %timeit func2()
1000 loops, best of 3: 279 µs per loop

In [15]: %timeit func1_3()
1000 loops, best of 3: 213 µs per loop

In [16]: %timeit func2()
1000 loops, best of 3: 273 µs per loop

答案 1 :(得分:5)

如果您可以将整个文件作为字符串读取,那将非常简单。 (即,这不是太大了)

fileStr = open('foo.txt').read().split() 
integers = [int(x) for x in fileStr if x.isdigit()]

read()将其转换为长字符串,split基于空格(即空格和换行符)拆分为字符串列表。因此,您可以将其与列表推导相结合,如果它们是数字,则将它们转换为整数。

正如Bakuriu所指出的,如果保证文件只有空格和数字,那么你不必检查isdigit()。在这种情况下,使用list(map(int, open('foo.txt').read().split()))就足够了。如果任何东西是无效的整数,那么该方法将引发错误,而另一方法将跳过任何不是可识别数字的东西。

答案 2 :(得分:4)

谢谢大家。我混合了你发布的一些解决方案。这对我来说似乎非常好:

with open("foo.txt","r") as f:
    my_list = [int(i)  for line in f for i in line.split() if i.isdigit()]

答案 3 :(得分:3)

你可以使用list comprehension

这样做
my_list = [int(i)  for j in open("1.txt","r") for i in j.strip().split(" ") if i.isdigit()]

with open() method

with open("1.txt","r") as f:
    my_list = [int(i)  for j in f for i in j.strip().split(" ") if i.isdigit()]

<强>过程:

1.首先,你将迭代

2.然后你将迭代这些单词并看到它们是数字,如果是这样我们将它们添加到列表中

修改

您需要将strip()添加到行,因为行的每一行(除了最后一行)都会在其中包含新的行空格(&#34; \ n&#34;)并尝试{{1} }

<强>即)

is.digit("number\n") you will get false

<强> EDIT2:

<强>输入:

>>> "1\n".isdigit()
False

阅读时的文件数据:

1
qw 2
23 we 32

您可以看到它会影响流程的a=open("1.txt","r") repr(a.read()) "'1\\nqw 2\\n23 we 32'" 新行

当我使用"\n"运行该功能时,它不会将strip()作为数字,因为它包含新行字符

1 and 2

从输出中可以清楚地看到1和2缺失。如果我们使用my_list = [int(i) for j in open("1.txt","r") for i in j.split(" ") if i.isdigit()] my_list [23, 32]

,这可以避免

答案 4 :(得分:3)

为什么不使用yield关键字?代码将如...

def readInt():
    for line in open("foo.txt", "r"):
        for i in line.strip().split(' '):
            if i.isdigit():
                yield int(i)

然后你可以阅读

    for num in readInt():
        list.append(num)

答案 5 :(得分:3)

my_list = []
with open('foo.txt') as f:
    for line in f:
        for s in line.split():
            try:
                my_list.append(int(s))
            except ValueError:
                pass

答案 6 :(得分:3)

试试这个:

with open('file.txt') as f:
    nums = []
    for l in f:
        l = l.strip()
        nums.extend([int(i) for i in l.split() if i.isdigit() and l])
如果新行(&#39; \ n&#39;)存在,则需要

l.strip(),因为i.isdigit('6\n')无法正常工作。

list.extend在这里派上用场了

最后的and l确保丢弃任何空列表结果

默认情况下,

str.split在空格上分割。并且with块将在执行代码后自动关闭文件。 我也使用了list comprehensions

答案 7 :(得分:0)

这是我找到的最快的方法:

import re
regex = re.compile(r"\D+")

with open("foo.txt", "r") as f:
    my_list = list(map(int, regex.split(f.read())))

虽然结果可能取决于文件的大小。