Question

我正在处理一个包含多个信息的文本文件。我在 python 中将它转换成一个列表，现在我试图将不同的数据分成不同的列表。数据呈现如下：

代码/说明/统一/价值1/价值2/价值3/价值4然后重复，一个例子是：

P03133 Auxiliar helper un 203.02 417.54 437.22 675.80

到目前为止，我的做法是：

创建列表来存储每个信息：

codes = []
description = []
unity = []
cost = []

通过循环查找代码，基于代码的结构，并使用代码的索引作为基础来查找剩余的值。

查找代码很容易，它是其他数据中一种独特的信息类型。对于剩余的值，我做了一个循环来查找代码后的下一个数字值。这样我就可以分隔其余的索引：

统一是代码的索引 + 索引直到 isnumeric - 1，因此它是每行中第一个数值之前的第一个信息。
成本是代码的索引 + 索引直到 isnumeric + 2，第三个值是我唯一需要存储的值。
描述有点难，组成它的元素数量在列表中有所不同。所以我使用了从代码索引 + 1 开始到索引结束直到 isnumeric - 2 的切片。

for i, carc in enumerate(txtl):
    if carc[0] == "P" and carc[1].isnumeric():
        codes.append(carc)
        j = 0
        while not txtl[i+j].isnumeric():
            j = j + 1
        description.append(" ".join(txtl[i+1:i+j-2]))
        unity.append(txtl[i+j-1])
        cost.append(txtl[i+j])

我在使用这种方法时遇到了一些问题，尽管在我收到错误代码后列表中总会有更多元素：

  while not txtl[i+j].isnumeric():
    txtl[i+j] list index out of range.

接受任何解决方案来调试我的代码，甚至是问题的新解决方案。

OBS：我还必须对非常相似的数据字体执行此操作，但代码只是 7 个数字的序列，因此很难在其他数据中找到。任何包含此方面的解决方案也值得赞赏！

Answer 1

对您的代码稍加添加即可解决此问题：

        while i+j < len(txtl) and not txtl[i+j].isnumeric():
            j += 1

第一个条件越界时失败，因此第二个条件不会被检查。

另外，请使用 dict 项目列表而不是 4 个不同的列表，fe:

thelist = []
thelist.append({'codes': 69, 'description': 'random text', 'unity': 'whatever', 'cost': 'your life'})

通过这种方式，您始终可以在列表中获得正确的值，并且您无需使用索引或其他黑魔法来跟踪您的位置...

评论互动后编辑： 好的，所以在这种情况下，您将正在处理的行拆分为空格字符，然后处理行中的单词。

from pprint import pprint  # just for pretty printing


textl = 'P03133 Auxiliar helper un 203.02 417.54 437.22 675.80'
the_list = []

def handle_line(textl: str):
    description = ''
    unity = None
    values = []
    for word in textl.split()[1:]:
        # it splits on space characters by default
        # you can ignore the first item in the list, as this will always be the code
        # str.isnumeric() doesn't work with floats, only integers. See https://stackoverflow.com/a/23639915/9267296
        if not word.replace(',', '').replace('.', '').isnumeric():
            if len(description) == 0:
                description = word
            else:
                description = f'{description} {word}' # I like f-strings
        elif not unity:
            # if unity is still None, that means it has not been set yet
            unity = word
        else:
            values.append(word)
    return {'code': textl.split()[0], 'description': description, 'unity': unity, 'values': values}

the_list.append(handle_line(textl))

pprint(the_list)

str.isnumeric() 不适用于浮点数，仅适用于整数。见https://stackoverflow.com/a/23639915/9267296

从单个列表中提取多个数据

1 个答案: