Question

我有一个会计树，它在源代码中存有缩进/空格：

Income
   Revenue
      IAP
      Ads
   Other-Income
Expenses
   Developers
      In-house
      Contractors
   Advertising
   Other Expenses

有一定数量的级别，所以我想通过使用3个字段来平整层次结构（实际数据有6个级别，例如简化）：

L1       L2            L3
Income
Income   Revenue
Income   Revenue       IAP
Income   Revenue       Ads
Income   Other-Income
Expenses Developers    In-house
 ... etc

我可以通过检查帐户名称之前的空格数来执行此操作：

for rownum in range(6,ws.max_row+1):
   accountName = str(ws.cell(row=rownum,column=1).value)
   indent = len(accountName) - len(accountName.lstrip(' '))
   if indent == 0:
      l1 = accountName
      l2 = ''
      l3 = ''
   elif indent == 3:
      l2 = accountName
      l3 = ''
   else:
      l3 = accountName

   w.writerow([l1,l2,l3])

是否有更灵活的方法来实现这一点，基于当前行与前一行相比的缩进而不是假设每个级别总是3个空格？ L1将始终没有缩进，我们可以相信较低级别将比其父级缩进，但每个级别可能不总是3个空格。

更新，最后将其作为逻辑的核心，因为我最终想要带有内容的帐户列表，使用缩进来决定是否重置，追加或弹出列表似乎最简单：

        if indent == 0:
            accountList = []
            accountList.append((indent,accountName))
        elif indent > prev_indent:
            accountList.append((indent,accountName))
        elif indent <= prev_indent:
            max_indent = int(max(accountList,key=itemgetter(0))[0])
            while max_indent >= indent:
                accountList.pop()
                max_indent = int(max(accountList,key=itemgetter(0))[0])
            accountList.append((indent,accountName))

因此，在每行输出中，accountList都已完成。

Answer 1

你可以模仿Python实际解析缩进的方式。首先，创建一个包含缩进级别的堆栈。在每一行：

如果压痕大于堆叠顶部，请按下它并增加深度级别。
如果相同，请继续保持同一级别。
如果它较低，则在高于新缩进时弹出堆栈顶部。如果在找到完全相同之前找到较低的缩进级别，则会出现缩进错误。

indentation = []
indentation.append(0)
depth = 0

f = open("test.txt", 'r')

for line in f:
    line = line[:-1]

    content = line.strip()
    indent = len(line) - len(content)
    if indent > indentation[-1]:
        depth += 1
        indentation.append(indent)

    elif indent < indentation[-1]:
        while indent < indentation[-1]:
            depth -= 1
            indentation.pop()

        if indent != indentation[-1]:
            raise RuntimeError("Bad formatting")

    print(f"{content} (depth: {depth})")

使用＆＃34; test.txt＆＃34;文件的内容与您提供的一致：

Income
   Revenue
      IAP
      Ads
   Other-Income
Expenses
   Developers
      In-house
      Contractors
   Advertising
   Other Expenses

这是输出：

Income (depth: 0)
Revenue (depth: 1)
IAP (depth: 2)
Ads (depth: 2)
Other-Income (depth: 1)
Expenses (depth: 0)
Developers (depth: 1)
In-house (depth: 2)
Contractors (depth: 2)
Advertising (depth: 1)
Other Expense (depth: 1)

那么，你能做些什么呢？假设您要构建嵌套列表。首先，创建一个数据堆栈。

找到缩进后，在数据堆栈的末尾附加一个新列表。
当您找到未经注释时，请弹出顶部列表，然后将其附加到新的顶部。

无论如何，对于每一行，将内容附加到数据堆栈顶部的列表中。

以下是相应的实现：

for line in f:
    line = line[:-1]

    content = line.strip()
    indent = len(line) - len(content)
    if indent > indentation[-1]:
        depth += 1
        indentation.append(indent)
        data.append([])

    elif indent < indentation[-1]:
        while indent < indentation[-1]:
            depth -= 1
            indentation.pop()
            top = data.pop()
            data[-1].append(top)

        if indent != indentation[-1]:
            raise RuntimeError("Bad formatting")

    data[-1].append(content)

while len(data) > 1:
    top = data.pop()
    data[-1].append(top)

您的嵌套列表位于data堆栈的顶部。同一文件的输出是：

['Income',
    ['Revenue',
        ['IAP',
         'Ads'
        ],
     'Other-Income'
    ],
 'Expenses',
    ['Developers',
        ['In-house',
         'Contractors'
        ],
     'Advertising',
     'Other Expense'
    ]
 ]

这很容易操作，虽然嵌套很深。您可以通过链接项目访问来访问数据：

>>> l = data[0]
>>> l
['Income', ['Revenue', ['IAP', 'Ads'], 'Other-Income'], 'Expenses', ['Developers', ['In-house', 'Contractors'], 'Advertising', 'Other Expense']]
>>> l[1]
['Revenue', ['IAP', 'Ads'], 'Other-Income']
>>> l[1][1]
['IAP', 'Ads']
>>> l[1][1][0]
'IAP'

Answer 2

如果缩进是固定数量的空格（此处为3个空格），则可以简化缩进级别的计算。

注意：我使用StringIO来模拟文件

import io
import itertools

content = u"""\
Income
   Revenue
      IAP
      Ads
   Other-Income
Expenses
   Developers
      In-house
      Contractors
   Advertising
   Other Expenses
"""

stack = []
for line in io.StringIO(content):
    content = line.rstrip()  # drop \n
    row = content.split("   ")
    stack[:] = stack[:len(row) - 1] + [row[-1]]
    print("\t".join(stack))

你得到：

Income
Income  Revenue
Income  Revenue IAP
Income  Revenue Ads
Income  Other-Income
Expenses
Expenses    Developers
Expenses    Developers  In-house
Expenses    Developers  Contractors
Expenses    Advertising
Expenses    Other Expenses

编辑：缩进未修复

如果缩进没有修复（你并不总是有3个空格），如下例所示：

content = u"""\
Income
   Revenue
    IAP
    Ads
   Other-Income
Expenses
   Developers
      In-house
      Contractors
  Advertising
  Other Expenses
"""

您需要估算每个新行的转移：

stack = []
last_indent = u""
for line in io.StringIO(content):
    indent = "".join(itertools.takewhile(lambda c: c == " ", line))
    shift = 0 if indent == last_indent else (-1 if len(indent) < len(last_indent) else 1)
    index = len(stack) + shift
    stack[:] = stack[:index - 1] + [line.strip()]
    last_indent = indent
    print("\t".join(stack))

如何使用python基于缩进解析层次结构

2 个答案: