基于缩进的语法 - > AST

时间:2014-05-13 20:13:46

标签: javascript python parsing indentation abstract-syntax-tree

假设我想重新发明CoffeeScript :)或者Python。或手写笔,或YAML :) 我需要一些工具,它将我的基于缩进的语法转换为抽象语法树。不幸的是,谷歌对[基于缩进的sytntax到AST]一无所知。你们知道这样的工具吗? 更具体地说,我有什么

===source===
Lorem ipsum:
    dolor sit amet:
        consectetuer adipiscing elit
    sed diam nonummy
nibh euismod tincidunt:
    ut laoreet dolore

......以及我需要的东西:

===result===
[
    {
        directive: "Lorem ipsum", 
        content: [
            {
                directive: "dolor sit amet", 
                content: [
                    {directive: "consectetuer adipiscing elit", content: []}
                ]
            },
            {directive: "sed diam nonummy", content: []}
         ]
     }, {
        directive: "nibh euismod tincidunt",
        content: [
            {directive:"ut laoreet dolore", content: []}
        ]
     }
]

如果你能推荐这样的工具,那就太好了。如果这个工具是用python / javascript编写的,并且将结果显示为JSON,那将是非常棒的。 如果您可以提供一些关于如何自己创建这个梦工具的建议,那也很酷:) 感谢名单!

1 个答案:

答案 0 :(得分:1)

使用递归自己写这个很简单。这是一个创建list - 我会留下dict版本作为练习。

import sys
import re

def DentArthurDent(fp, dents = 0, nextline = None):
    '''Read from FP until EOF or an exdent
       Return dict and next line'''

    tree = []
    while True:
        line, nextline = nextline or fp.readline(), None
        if not line:
            return tree, ''
        parts = re.match(r'(^ *)(.*)', line).group(1,2)
        dent = len(parts[0])
        if dent == dents:
            tree.append(parts[1])
        elif dent > dents:
            child_tree, nextline = DentArthurDent(fp, dent, line)
            tree.append(child_tree)
        else:
            return tree,line


import json
tree, _ = DentArthurDent(sys.stdin)
print json.dumps(tree, indent=4)

此输入:

line 1
line 2
  line 3
    line 4
    line 5
  line 6

产生此输出:

[
    "line 1", 
    "line 2", 
    [
        "line 3", 
        [
            "line 4", 
            "line 5"
        ], 
        "line 6"
    ]
]