使用INT和FLOAT进行意外的ANTLR4优化?

时间:2014-05-01 15:58:12

标签: parsing antlr antlr4

我正在使用ClaspANTLR 4的输出编写解析器。典型输出如下:

clasp version 3.0.3
Reading from stdin
Solving...
Answer: 1
bird(a) bird(b) bird(c) penguin(d) bird(d)
Optimization: 7 0
Answer: 2
bird(a) bird(b) bird(c) penguin(d) bird(d) flies_abd(b) flies(b)
Optimization: 6 5
Answer: 3
bird(a) bird(b) bird(c) penguin(d) bird(d) flies_abd(c) flies(c)
Optimization: 2 5
Answer: 4
bird(a) bird(b) bird(c) penguin(d) bird(d) flies_abd(a) flies_abd(c) flies(a) flies(c)
Optimization: 1 10
Answer: 5
bird(a) bird(b) bird(c) penguin(d) bird(d) flies_abd(a) flies_abd(b) flies_abd(c) flies(a) flies(b) flies(c)
Optimization: 0 15
OPTIMUM FOUND

Models       : 5     
  Optimum    : yes
Optimization : 0 15
Calls        : 1
Time         : 0.002s (Solving: 0.00s 1st Model: 0.00s Unsat: 0.00s)
CPU Time     : 0.000s

我必须检查clasp是否为版本3所以我正在编写如下语法:

/**
 * Define a grammar for Clasp 3's output.
 */
grammar Output;

@header {package ac.bristol.clasp.parser;}

output:
    version source solving answer* result separation statistics NEWLINE* EOF;

version: 'clasp version 3.' INT '.' INT NEWLINE;

source: 'Reading from stdin' NEWLINE # sourceSTDIN
    | 'Reading from ' path NEWLINE # sourceFile;

path:
    DRIVE? folder ( BSLASH folder )* filename # pathWindows
    | FSLASH? folder ( FSLASH folder )* filename # pathNIX;

folder:
    LETTER+ # genericFolder
    | DOTDOT # parentFolder
    | DOT # currentFolder;

solving: 'Solving...' NEWLINE;

filename:
    LETTER+ extension?;

extension:
    DOT LETTER*;

answer: 'Answer: ' INT NEWLINE // 
    model? NEWLINE // 
    'Optimization: ' INT ( SPACE INT )* NEWLINE;

model:
    fact ( SPACE fact )*;

fact:
    groundPredicate;

groundTermList:
    groundTerm ( COMMA groundTerm )*;

groundTerm:
    groundCompound | STRING | number | atom; // literal?

groundCompound:
    groundPredicate
    | groundExpression;

groundPredicate:
    IDENTIFIER ( LROUND groundTermList RROUND )?;

groundExpression:
    groundBits AND groundBits
    | groundBits OR groundBits
    | groundBits XOR groundBits;

groundBits:
    groundCompare GT groundCompare
    | groundCompare GE groundCompare
    | groundCompare LT groundCompare
    | groundCompare LE groundCompare;

groundCompare:
    groundItem EQ groundItem
    | groundItem NE groundItem;

groundItem:
    groundFactor PLUS groundFactor
    | groundFactor MINUS groundFactor;

groundFactor:
    groundUnary TIMES groundUnary
    | groundUnary DIVIDE groundUnary
    | groundUnary MOD groundUnary;

groundUnary:
    TILDE groundTerm
    | MINUS groundTerm;

atom:
    IDENTIFIER
    | QUOTED;

number:
    INT
    | FLOAT;

//------------------------------------------------------------------------------

result: 'OPTIMUM FOUND' NEWLINE
    | 'SATISFIABLE' NEWLINE
    | 'UNKNOWN' NEWLINE;

separation:
    NEWLINE;

statistics:
    models optimum? optimization calls time cputime;

models: 'Models       : ' INT SPACE* NEWLINE;

optimum: '  Optimum    : yes' NEWLINE
    | '  Optimum    : no' NEWLINE;

optimization: 'Optimization : ' INT ( SPACE INT )* NEWLINE;
calls: 'Calls        : ' INT NEWLINE;
time: 'Time         : ' FLOAT 's (Solving: ' FLOAT 's 1st Model: ' FLOAT 's Unsat: ' FLOAT 's)' NEWLINE;
cputime: 'CPU Time     : ' FLOAT 's';

//------------------------------------------------------------------------------

AND:       '&';
BSLASH:    '\\';
COLON:     ':';
COMMA:     ',';
DIVIDE:    '/';
DOT:       '.';
DOTDOT:    '..';
EQ:        '==';
FSLASH:    '/';
GE:        '>=';
GT:        '>';
LE:        '<=';
LROUND:    '(';
LT:        '<';
MINUS:     '-';
MOD:       '%';
NE:        '!=';
OR:        '?';
PLUS:      '+';
RROUND:    ')';
SEMICOLON: ';';
SPACE:     ' ';
TILDE:     '~';
TIMES:     '*';
XOR:       '^';

DRIVE:      ( LOWER | UPPER ) COLON BSLASH?;
IDENTIFIER: LOWER FOLLOW*;
INT:        DIGIT+;
FLOAT:      DIGIT+ DOT DIGIT+;
NEWLINE:    '\r'? '\n';
QUOTED:     '\'' ( ~[\'\\] | ESCAPE )+? '\'';
STRING:     '"' ( ~["\\] | ESCAPE )+? '"';

fragment DIGIT:      [0] | NONZERO;
fragment ESCAPE:     '\\' [btnr"\\] | '\\' [0-3]? [0-7]? [0-7] | '\\' 'u' [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F];
fragment FOLLOW:     LOWER | UPPER | DIGIT | UNDERSCORE;
fragment LETTER:     LOWER | UPPER | DIGIT | SPACE;
fragment LOWER:      [a-z];
fragment NONZERO:    [1-9];
fragment UNDERSCORE: [_];
fragment UPPER:      [A-Z];

请注意,skip输入流的某些部分没有规则,因为我想检查每个字符。 另请注意,我INT的终端规则为FLOATINTFLOATFLOAT之前定义了version: 'clasp version 3.' INT '.' INT NEWLINE; clasp的定义与Prolog相同。

解析上述示例第一行的规则如下:

FLOAT

因为我必须检查所使用的line 1:16 mismatched input '0.3' expecting INT 主要版本号是否为3,而不是使用读取次要版本号,点,内部版本号和换行符的其余行(没有空格或任何地方)。 不幸的是,我收到以下警告消息,这让我觉得ANTLR将次要版本号,点和内部版本号识别为ANTLR

{{1}}

请你解释一下发生了什么事? 我是否应该做一些我不应该做的事情 或者是{{1}}正在应用不需要的优化吗?

1 个答案:

答案 0 :(得分:0)

ANTLR将您的输入分解为令牌,并且仅在解析令牌之后。您在解析器规则中使用'clasp version 3.'隐式定义了与该文本字符串匹配的匿名标记。该标记后面的文本以0.0开头,与{float}匹配。词法分析器不知道解析器在那时将处于version规则中;它只选择从当前位置开始的最长令牌,0.0作为FLOAT长于0作为INT。我推荐以下内容:

  1. 将语法分为parser grammar OutputParser;lexer grammar OutputLexer;在解析器语法中,使用tokenVocab选项指示哪个词法分析器定义了您的标记。这种分离将迫使您为语法正在使用的所有内容定义真实的标记。

    options {
      tokenVocab = OutputLexer;
    }
    
  2. 使用FLOAT代替INT '.' INT,或创建新代币来表示版本:

    VERSION
      : DIGIT+ DOT DIGIT+ DOT DIGIT+
      ;