我可以使用正则表达式来定义ISO EBNF中的字符串吗?

时间:2015-04-11 12:52:57

标签: computer-science dsl bnf ebnf

我正在使用标准化版本(ISO / IEC 14997:1996(E))EBNF来定义我的语法。  标准化版本是一种元元语言(它可以解析自己)。

我将letter定义为:

letter =  'A' | 'B' | 'C' | 'D' | 'E' | 'H' | 'I' | 'J' | 'K' | 'L' |
'O' | 'P' | 'Q' | 'R' | 'S' | 'V' | 'W' | 'X' | 'Y' | 'Z' | 'a' | 'b'
| 'c' | 'd' | 'e' | 'h' | 'i' | 'j' | 'k' | 'l' | 'o' | 'p' | 'q' |
'r' | 's' | 'v' | 'w' | 'x' | 'y' | 'z' 'F' | 'G' | 'M' | 'N' | 'T' |
'U' | 'f' | 'g' | 'm' | 'n' | 't' | 'u';

我更愿意写一下letter = [a..z]|[A..Z];

我的问题是:以这种形式定义letter(使用正则表达式)会毁掉自我定义的EBNF属性吗?

1 个答案:

答案 0 :(得分:1)

为此使用特殊序列:

  

特殊序列由特殊序列符号组成   然后是一个(可能是空的)特殊序列 -   序列字符后跟一个特殊序列 -   符号

     

由特殊序列表示的符号序列   超出本国际标准的范围。只有   特殊序列的格式在本国际中定义   标准。特殊序列提供了一种表示法   用户可能需要的扩展名。

W3C广泛使用它。例如:

The formal grammar of XML is given in this specification using a simple Extended Backus-Naur Form (EBNF) notation. Each rule in the grammar defines one symbol, in the form

symbol ::= expression

Symbols are written with an initial capital letter if they are the start symbol of a regular language, otherwise with an initial lowercase letter. Literal strings are quoted.

Within the expression on the right-hand side of a rule, the following expressions are used to match strings of one or more characters:

#xN

    where N is a hexadecimal integer, the expression matches the character whose number (code point) in ISO/IEC 10646 is N. The number of leading zeros in the #xN form is insignificant.
[a-zA-Z], [#xN-#xN]

    matches any Char with a value in the range(s) indicated (inclusive).
[abc], [#xN#xN#xN]

    matches any Char with a value among the characters enumerated. Enumerations and ranges can be mixed in one set of brackets.
[^a-z], [^#xN-#xN]

    matches any Char with a value outside the range indicated.
[^abc], [^#xN#xN#xN]

    matches any Char with a value not among the characters given. Enumerations and ranges of forbidden values can be mixed in one set of brackets.
"string"

    matches a literal string matching that given inside the double quotes.
'string'

    matches a literal string matching that given inside the single quotes.

These symbols may be combined to match more complex patterns as follows, where A and B represent simple expressions:

(expression)

    expression is treated as a unit and may be combined as described in this list.
A?

    matches A or nothing; optional A.
A B

    matches A followed by B. This operator has higher precedence than alternation; thus A B | C D is identical to (A B) | (C D).
A | B

    matches A or B.
A - B

    matches any string that matches A but does not match B.
A+

    matches one or more occurrences of A. Concatenation has higher precedence than alternation; thus A+ | B+ is identical to (A+) | (B+).
A*

    matches zero or more occurrences of A. Concatenation has higher precedence than alternation; thus A* | B* is identical to (A*) | (B*).

Other notations used in the productions are:

/* ... */

    comment.
[ wfc: ... ]

    well-formedness constraint; this identifies by name a constraint on well-formed documents associated with a production.
[ vc: ... ]

    validity constraint; this identifies by name a constraint on valid documents associated with a production.

<强>参考