如何创建解析器(lex / yacc)?

时间:2011-03-12 12:08:05

标签: c# yacc lex parser-generator

我有以下文件,需要解析

--TestFile
Start ASDF123
Name "John"
Address "#6,US" 
end ASDF123

--开头的行将被视为注释行。文件以“开始”开头,以end结束。 Start之后的字符串是UserID,然后nameaddress将位于双引号内。

我需要解析文件并将解析后的数据写入xml文件。

所以生成的文件就像

<ASDF123>
  <Name Value="John" />
  <Address Value="#6,US" />
</ASDF123>

现在我正在使用模式匹配(Regular Expressions)来解析上面的文件。这是我的示例代码。

    /// <summary>
    /// To Store the row data from the file
    /// </summary>
    List<String> MyList = new List<String>();

    String strName = "";
    String strAddress = "";
    String strInfo = "";

方法:ReadFile

    /// <summary>
    /// To read the file into a List
    /// </summary>
    private void ReadFile()
    {
        StreamReader Reader = new StreamReader(Application.StartupPath + "\\TestFile.txt");
        while (!Reader.EndOfStream)
        {
            MyList.Add(Reader.ReadLine());
        }
        Reader.Close();
    }

方法:FormateRowData

    /// <summary>
    /// To remove comments 
    /// </summary>
    private void FormateRowData()
    {
        MyList = MyList.Where(X => X != "").Where(X => X.StartsWith("--")==false ).ToList();
    }

方法:ParseData

    /// <summary>
    /// To Parse the data from the List
    /// </summary>
    private void ParseData()
    {
        Match l_mMatch;
        Regex RegData = new Regex("start[ \t\r\n]*(?<Data>[a-z0-9]*)", RegexOptions.IgnoreCase);
        Regex RegName = new Regex("name [ \t\r\n]*\"(?<Name>[a-z]*)\"", RegexOptions.IgnoreCase);
        Regex RegAddress = new Regex("address [ \t\r\n]*\"(?<Address>[a-z0-9 #,]*)\"", RegexOptions.IgnoreCase);
        for (int Index = 0; Index < MyList.Count; Index++)
        {
            l_mMatch = RegData.Match(MyList[Index]);
            if (l_mMatch.Success)
                strInfo = l_mMatch.Groups["Data"].Value;
            l_mMatch = RegName.Match(MyList[Index]);
            if (l_mMatch.Success)
                strName = l_mMatch.Groups["Name"].Value;
            l_mMatch = RegAddress.Match(MyList[Index]);
            if (l_mMatch.Success)
                strAddress = l_mMatch.Groups["Address"].Value;
        }
    }

方法:WriteFile

    /// <summary>
    /// To write parsed information into file.
    /// </summary>
    private void WriteFile()
    {
        XDocument XD = new XDocument(
                           new XElement(strInfo,
                                         new XElement("Name",
                                             new XAttribute("Value", strName)),
                                         new XElement("Address",
                                             new XAttribute("Value", strAddress))));
        XD.Save(Application.StartupPath + "\\File.xml");
    }

我听说过ParserGenerator

请帮我用lex和yacc编写一个解析器。这样做的原因是,我的现有解析器(Pattern Matching)不灵活,更不是正确的方式(我想是这样)。

如何使用ParserGenerator(我已阅读Code Project Sample OneCode Project Sample Two,但我对此并不熟悉)。请给我一些输出C#解析器的解析器生成器。

2 个答案:

答案 0 :(得分:5)

Gardens Point LEXGardens Point Parser Generator受LEX和YACC的强烈影响,并输出C#代码。

你的语法很简单,我认为你现在的方法很好,但是想要学习“真实”的方法是值得称道的。 :-)所以这是我对语法的建议(只是生产规则;这远不是一个完整的例子。实际的GPPG文件需要用C#代码替换...来构建语法树,你需要令牌声明等 - 阅读文档中的GPPG示例。您还需要描述令牌的GPLEX文件:

/* Your input file is a list of "top level elements" */
TopLevel : 
    TopLevel TopLevelElement { ... }
    | /* (empty) */

/* A top level element is either a comment or a block. 
   The COMMENT token must be described in the GPLEX file as 
   any line that starts with -- . */
TopLevelElement:
    Block { ... }
    | COMMENT { ... }

/* A block starts with the token START (which, in the GPLEX file, 
   is defined as the string "Start"), continues with some identifier 
   (the block name), then has a list of elements, and finally the token
   END followed by an identifier. If you want to validate that the
   END identifier is the same as the START identifier, you can do that
   in the C# code that analyses the syntax tree built by GPPG.
   The token Identifier is also defined with a regular expression in GPLEX. */
Block:
    START Identifier BlockElementList END Identifier { ... }

BlockElementList:
    BlockElementList BlockElement { ... }
    | /* empty */

BlockElement:
    (NAME | ADDRESS) QuotedString { ... }

答案 1 :(得分:1)

首先必须为解析器定义语法。 (Yacc部分)

似乎是这样的:

file : record file
     ;

record: start identifier recordContent end identifier {//rule to match the two identifiers}
      ;

recordContent: name value; //Can be more detailed if you require order in the fields

词汇分析将是lex。我猜你的正则表达式对定义它们很有用。

我的回答是粗略的草稿,我建议你在互联网上查找关于lex / yacc flex / bison的更完整的教程,如果你有更集中的问题,请回到这里。

我也不知道是否存在允许您保留托管代码的C#实现。您可能必须使用非托管C / C ++导入。