Question

我正在尝试将以下文本文件放入字典中，但我希望任何以＆＃39;＃＆＃39;开头的部分。或忽略空行。

我的文本文件如下所示：

# This is my header info followed by an empty line

Apples          1                # I want to ignore this comment
Oranges         3                # I want to ignore this comment

#~*~*~*~*~*~*~*Another comment~*~*~*~*~*~*~*~*~*~*

Bananas         5                # I want to ignore this comment too!

我想要的输出是：

myVariables = {'Apples': 1, 'Oranges': 3, 'Bananas': 5}

我的Python代码如下：

filename = "myFile.txt"
myVariables = {}

with open(filename) as f:
    for line in f:
        if line.startswith('#') or not line:
            next(f)

        key, val = line.split()
        myVariables[key] = val
        print "key: " + str(key) + " and value: " + str(val)

我得到的错误：

Traceback (most recent call last):
  File "C:/Python27/test_1.py", line 11, in <module>
    key, val = line.split()
ValueError: need more than 1 value to unpack

我理解错误，但我不明白代码有什么问题。

提前谢谢！

Answer 1

鉴于你的文字：

text = """
# This is my header info followed by an empty line

Apples          1                # I want to ignore this comment
Oranges         3                # I want to ignore this comment

#~*~*~*~*~*~*~*Another comment~*~*~*~*~*~*~*~*~*~*

Bananas         5                # I want to ignore this comment too!
"""

我们可以通过两种方式做到这一点。使用regex或使用Python生成器。在这种情况下，我会选择后者（如下所述）regex并不是特别快（呃）。

要打开文件：

with open('file_name.xyz', 'r') as file: 
    # everything else below. Just substitute `for line in lines` with 
    # `for line in file.readline()`

现在创建一个类似的，我们拆分行，并创建一个列表：

lines = text.split('\n')  # as if read from a file using `open`.

以下是我们如何按以下方式完成所有操作：

# Discard all comments and empty values.
comment_less = filter(None, (line.split('#')[0].strip() for line in lines))

# Separate items and totals. 
separated = {item.split()[0]: int(item.split()[1]) for item in comment_less}

让我们测试一下：

>>> print(separated)
{'Apples': 1, 'Oranges': 3, 'Bananas': 5}

希望这会有所帮助。

Answer 2

这并不能完全重现您的错误，但您的代码存在问题：

>>> x = "Apples\t1\t# This is a comment"
>>> x.split()
['Apples', '1', '#', 'This', 'is', 'a', 'comment']
>>> key, val = x.split()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: too many values to unpack

而是尝试：

key = line.split()[0]
val = line.split()[1]

编辑：我认为您的＆＃34;需要超过1个值才能解压缩＆＃34;来自空白行。另外，我不熟悉这样使用next()。我想我会做类似的事情：

if line.startswith('#') or line == "\n":
    pass
else:
    key = line.split()[0]
    val = line.split()[1]

Answer 3

您需要忽略空行和以#开头的行，在#上拆分或使用 rfind 分割其余行以切割字符串，空line将有一个新行，所以你需要and line.strip()来检查一行，你不能只拆分空格并解压缩，因为你在拆分后有两个以上的元素，包括注释中的内容：

with open("in.txt") as f:
    d = dict(line[:line.rfind("#")].split() for line in f
              if not line.startswith("#") and line.strip())
    print(d)

输出：

{'Apples': '1', 'Oranges': '3', 'Bananas': '5'}

另一种选择是分割两次并切片：

with open("in.txt") as f:
    d = dict(line.split(None,2)[:2] for line in f
              if not line.startswith("#") and line.strip())
    print(d)

或者拆分两次并使用显式循环解压缩：

with open("in.txt") as f:
    d = {}
    for line in f:
        if not line.startswith("#") and line.strip():
            k, v, _ = line.split(None, 2)
            d[k] = v

您也可以使用 itertools.groupby 对所需的行进行分组。

from itertools import groupby
with open("in.txt") as f:
    grouped = groupby(f, lambda x: not x.startswith("#") and x.strip())
    d = dict(next(v).split(None, 2)[:2] for k, v in grouped if k)
    print(d)

要处理单引号中多个单词的位置，我们可以使用 shlex 进行拆分：

import shlex
with open("in.txt") as f:
    d = {}
    for line in f:
        if not line.startswith("#") and line.strip():
            data = shlex.split(line)
            d[data[0]] = data[1]

print(d)

所以将香蕉行改为：

 Bananas          'north-side disabled'                # I want to ignore this comment too!

我们得到：

{'Apples': '1', 'Oranges': '3', 'Bananas': 'north-side disabled'}

同样适用于切片：

with open("in.txt") as f:
    d = dict(shlex.split(line)[:2] for line in f
              if not line.startswith("#") and line.strip())
    print(d)

Answer 4

要删除评论，您可以使用str.partition()，无论评论标志是否存在于该行中，该for line in file: line, _, comment = line.partition('#') if line.strip(): # non-blank line key, value = line.split()都有效：

line.split()

{{1}}也可能在此代码中引发异常 - 如果非空行不包含两个以空格分隔的单词，则会发生这种情况 - 应用程序依赖于您在此情况下要执行的操作（忽略这些行，打印警告等）。

Answer 5

如果正确定义了文件的格式，您可以尝试使用正则表达式的解决方案。这只是一个想法：

import re

fruits = {}
with open('fruits_list.txt', mode='r') as f:
    for line in f:
        match = re.match("([a-zA-Z0-9]+)[\s]+([0-9]+).*", line)
        if match:
            fruit_name, fruit_amount = match.groups()
            fruits[fruit_name] = fruit_amount


print fruits

<强>已更新：我改变了阅读大型文件的阅读方式。现在我逐行阅读，而不是一个一个。这样可以提高内存使用率。

Python：将文本文件读入dict并忽略注释

5 个答案: