如何编写引用字符的词法分析器规则?

时间:2016-04-27 17:05:31

标签: antlr antlr4

我想创建一个词法分析器规则,它可以读取定义自己的分隔符的字符串文字(具体来说,是Oracle引号分隔的字符串):

q'!My string which can contain 'single quotes'!'

其中!用作分隔符,但理论上可以是任何字符。

是否可以通过词法分析器规则执行此操作,而不会对给定的语言目标产生依赖性?

1 个答案:

答案 0 :(得分:3)

  

是否可以通过词法分析器规则执行此操作,而不会对给定的语言目标产生依赖性?

不,这样的事情需要目标依赖代码。

万一你或其他人正在阅读这个Q& A想知道如何使用目标代码完成此操作,这是一个快速演示:

lexer grammar TLexer;

@members {
  boolean ahead(String text) {
    for (int i = 0; i < text.length(); i++) {
      if (_input.LA(i + 1) != text.charAt(i)) {
        return false;
      }
    }
    return true;
  }
}

TEXT
 : [nN]? ( ['] ( [']['] | ~['] )* [']
         | [qQ] ['] QUOTED_TEXT [']
         )
 ;

// Skip everything other than TEXT tokens
OTHER
 : . -> skip
 ;

fragment QUOTED_TEXT
 : '[' ( {!ahead("]'")}?                      . )* ']'
 | '{' ( {!ahead("}'")}?                      . )* '}'
 | '<' ( {!ahead(">'")}?                      . )* '>'
 | '(' ( {!ahead(")'")}?                      . )* ')'
 |  .  ( {!ahead(getText().charAt(0) + "'")}? . )*  .
 ;

可以在课堂上进行测试:

public class Main {

    static void test(String input) {
        TLexer lexer = new TLexer(new ANTLRInputStream(input));
        CommonTokenStream tokenStream = new CommonTokenStream(lexer);
        tokenStream.fill();

        System.out.printf("input: `%s`\n", input);

        for (Token token : tokenStream.getTokens()) {
            if (token.getType() != TLexer.EOF) {
                System.out.printf("  token: -> %s\n", token.getText());
            }
        }

        System.out.println();
    }

    public static void main(String[] args) throws Exception {
        test("foo q'!My string which can contain 'single quotes'!' bar");
        test("foo q'(My string which can contain 'single quotes')' bar");
        test("foo 'My string which can contain ''single quotes' bar");
    }
}

将打印:

input: `foo q'!My string which can contain 'single quotes'!' bar`
  token: -> q'!My string which can contain 'single quotes'!'

input: `foo q'(My string which can contain 'single quotes')' bar`
  token: -> q'(My string which can contain 'single quotes')'

input: `foo 'My string which can contain ''single quotes' bar`
  token: -> 'My string which can contain ''single quotes'

替代<{p>中的.

|  .  ( {!ahead(getText().charAt(0) + "'")}? . )*  .

可能有点过于宽松,但可以通过将其替换为否定或常规字符集来进行调整。