Question

我正在解析一种语句，其中包含一个语句＆＃39; code＆＃39;接下来是＆＃39; {＆＃39;然后是一堆我对解析不感兴趣的代码，接着是＆＃39;}＆＃39;。我理想地喜欢这样的规则：

skip_code: 'code' '{' ~['}']* '}'

..这将简单地跳到结束大括号。问题是被跳过的代码本身可能有成对的花括号。所以，我基本上需要做的是运行一个计数器并增加每个＆＃39; {＆＃39;并减少每个＆＃39;}＆＃39;，并在计数器返回0时结束解析规则。

在ANTLR4中执行此操作的最佳方法是什么？当代码＆＃39;编码时，我应该跳到自定义功能吗？检测并吞下令牌并运行我的计数器，或者是否有一些优雅的方式在语法本身中表达这一点？

编辑：根据要求提供一些示例代码：

class foo;
  int m_bar;
  function foo_bar;
     print("hello world");
  endfunction
  code {
     // This is some C code
     void my_c_func() {
        printf("I have curly braces {} in a string!");
     }
  }
  function back_to_parsed_code;
  endfunction
endclass

Answer 1

我使用类似的东西：

skip_code: CODE_SYM block;
block: OPEN_CURLY (~CLOSE_CURLY | block)* CLOSE_CURLY;

CODE_SYM: 'code';
OPEN_CURLY: '{';
CLOSE_CURLY: '}';

Answer 2

我会在词法分析器中处理这些代码块。快速演示：

import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.Token;

public class Main {

    public static void main(String[] args) {

        String source = "class foo;\n" +
                "  int m_bar;\n" +
                "  function foo_bar;\n" +
                "     print(\"hello world\");\n" +
                "  endfunction\n" +
                "  code {\n" +
                "     // This is some C code }}} \n" +
                "     void my_c_func() {\n" +
                "        printf(\"I have curly braces {} in a string!\");\n" +
                "     }\n" +
                "  }\n" +
                "  function back_to_parsed_code;\n" +
                "  endfunction\n" +
                "endclass";

        System.out.printf("Tokenizing:\n\n%s\n\n", source);

        DemoLexer lexer = new DemoLexer(new ANTLRInputStream(source));

        for (Token t : lexer.getAllTokens()){
            System.out.printf("%-20s '%s'\n",
                    DemoLexer.VOCABULARY.getSymbolicName(t.getType()),
                    t.getText().replaceAll("[\r\n]", "\\\\n")
            );
        }
    }
}

如果您运行上述课程，将打印以下内容：

Tokenizing:

class foo;
  int m_bar;
  function foo_bar;
     print("hello world");
  endfunction
  code {
     // This is some C code }}} 
     void my_c_func() {
        printf("I have curly braces {} in a string!");
     }
  }
  function back_to_parsed_code;
  endfunction
endclass

ID                   'class'
ID                   'foo'
ANY                  ';'
ID                   'int'
ID                   'm_bar'
ANY                  ';'
ID                   'function'
ID                   'foo_bar'
ANY                  ';'
ID                   'print'
ANY                  '('
STRING               '"hello world"'
ANY                  ')'
ANY                  ';'
ID                   'endfunction'
ID                   'code'
BLOCK                '{\n     // This is some C code }}} \n     void my_c_func() {\n        printf("I have curly braces {} in a string!");\n     }\n  }'
ID                   'function'
ID                   'back_to_parsed_code'
ANY                  ';'
ID                   'endfunction'
ID                   'endclass'

Answer 3

您可以将modes用于您的目的。请注意CODE部分的两种模式。只有一种模式，Yoy无法正确关闭CODE部分。

<强>词法

lexer grammar Question_41355044Lexer;

CODE: 'code';
LCURLY: '{' -> pushMode(CODE_0);
WS:    [ \t\r\n] -> skip;

mode CODE_0;

CODE_0_LCURLY: '{' -> type(OTHER), pushMode(CODE_N);
RCURLY: '}' -> popMode;     // Close for LCURLY
CODE_0_OTHER: ~[{}]+ -> type(OTHER);

mode CODE_N;

CODE_N_LCURLY: '{' -> type(OTHER), pushMode(CODE_N);
CODE_N_RCURLY: '}' -> type(OTHER), popMode;
OTHER: ~[{}]+;

<强>分析器

parser grammar Question_41355044Parser;

options { tokenVocab = Question_41355044Lexer; }

skip_code: 'code' LCURLY OTHER* RCURLY;

<强>输入

code {
   // This is some C code
   void my_c_func() {
      printf("I have curly braces {} in a string!");
   }
}

输出代币

CODE LCURLY({) OTHER(   // Th...) OTHER({) OTHER(      pr...) 
OTHER({) OTHER(}) OTHER( in a st...) OTHER(}) OTHER() RCURLY(}) EOF

相同的方法用于ANTLR语法解析本身：https://github.com/antlr/grammars-v4/tree/master/antlr4

但是在那里使用运行时代码LexerAdaptor.py而不是两级模式。

ANTLR4解析规则以匹配打开/关闭括号

3 个答案: