解析挑战-修复语法错误

时间:2018-07-12 15:58:39

标签: regex parsing sed

我使用特殊的非标准语法编写了数千行代码。我需要能够使用不支持此语法的其他编译器来编译代码。我试图自动执行需要进行的更改,但是对正则表达式等的使用不是很好。我失败了。

这是我要实现的目标:当前,在我的代码中,使用以下可能的语法调用/访问对象的方法和变量:

call obj.method()
obj.method( )
obj.method( arg1, arg2, kwarg1=kwarg1 )
obj1.var = obj2.var2

相反,我希望它是:

call obj%method()
obj%method( )
obj%method( arg1, arg2, kwarg1=kwarg1 )
obj1%var = obj2%var2

我想进行这些更改而不会影响以下可能出现的“。” s:

小数:

a = 1.0
b = 1.d0

逻辑运算符(注意可能的空格和方法调用):

if (a.or.b) then
    if ( a .and. .not.(obj.l1(1.d0)) ) then

任何有注释的内容(为此使用感叹号“!”)

!>I am a commented line.
   ! > I am.a commented line with..leading blanks and extra periods.1.
b=a1.var( 0.d0 ) !! I contain a commented version of this line: b=a1.var( 0.d0 )  

任何用引号引起来(即字符串文字)

c = "I am a string"
c= 'I am an obnoxious string: b=a1.var( 0.d0 ) ... '

有人知道如何解决这个问题。我猜正则表达式是自然的方法,但是我对任何事物都持开放态度。 (以防有人在乎:代码是用fortran编写的。ifort对“。”语法感到满意;对gfortran则不满意)

2 个答案:

答案 0 :(得分:2)

您是否考虑过使用flex解决问题?它使用正则表达式,但是更高级,因为它尝试使用不同的模式并返回最长的匹配选项。规则如下所示:

napoleon_use_param = True

您可能需要修改第三行。当前,如果没有从conf.py%% /* rule part of the program */ !.*\n printf(yytext); /* ignore comments */ \".*\"|'.*' printf(yytext); /* ignore strings */ [^A-Za-z_][0-9]+\. printf(yytext); /* ignore numbers */ ".and."|".or."|".not." printf(yytext); /* ignore logical operators */ \. printf("%%"); /* now, replace the . by % */ [^\.] printf(yytext); /* ignore everything else */ %% /* invoke the program */ int main() { yylex(); } ,从.A的任何字符,它会忽略在任何位数之后出现的任何Z或数字前的字符a。如果标识符中还有更多合法字符,则可以添加它们。

如果一切正确,则应该可以将其转换为程序。将其复制到名为z的文件中并执行:

_

然后您有了C程序lex.l。您可以在命令行中使用它:

$ flex -o lex.yy.c lex.l
$ gcc -o lex.out lex.yy.c -lfl

这使用与Ed Mortons建议相同的原理,但是使用了flex,因此我们可以跳过组织。在某些情况下,例如在字符串中包含lex.out还是失败。

样本输入

cat unreplaced.txt | ./lex.out > replaced.txt

输出

\"

答案 1 :(得分:1)

如果没有语言解析器,您将无法做到100%健壮(例如,如果您将\"放在双引号字符串中,以下操作将在某些情况下失败-易于处理,但只是许多可能的失败之一而未包括在内)您的用例),但这将处理您到目前为止向我们展示的内容。它将GNU awk用于gensub(),将第三个arg用于match()。

示例输入:

$ cat file
call obj.method()
obj.method( )
obj.method( arg1, arg2, kwarg1=kwarg1 )
obj1.var = obj2.var2
a = 1.0
b = 1.d0
if (a.or.b) then
    if ( a .and. .not.(obj.l1(1.d0)) ) then
!>I am a commented line.
   ! > I am.a commented line with..leading blanks and extra periods.1.
b=a1.var( 0.d0 ) !! I contain a commented version of this line: b=a1.var( 0.d0 )
c = "I am a string"
c= 'I am an obnoxious string: b=a1.var( 0.d0 ) ... '
c="I am an exclaimed string!"; b=a1.var()

预期输出:

$ cat out
call obj%method()
obj%method( )
obj%method( arg1, arg2, kwarg1=kwarg1 )
obj1%var = obj2%var2
a = 1.0
b = 1.d0
if (a.or.b) then
    if ( a .and. .not.(obj%l1(1.d0)) ) then
!>I am a commented line.
   ! > I am.a commented line with..leading blanks and extra periods.1.
b=a1%var( 0.d0 ) !! I contain a commented version of this line: b=a1.var( 0.d0 )
c = "I am a string"
c= 'I am an obnoxious string: b=a1.var( 0.d0 ) ... '
c="I am an exclaimed string!"; b=a1%var()

脚本:

$ cat tst.awk
{
    # give us the ability to use @<any other char> strings as a
    # replacement/placeholder strings that cannot exist in the input.
    gsub(/@/,"@=")

    # ignore all !s inside double-quoted strings
    while ( match($0,/("[^"]*)!([^"]*")/,a) ) {
        $0 = substr($0,1,RSTART-1) a[1] "@-" a[2] substr($0,RSTART+RLENGTH)
    }

    # ignore all !s inside single-quoted strings
    while ( match($0,/('[^']*)!([^']*')/,a) ) {
        $0 = substr($0,1,RSTART-1) a[1] "@-" a[2] substr($0,RSTART+RLENGTH)
    }

    # Now we can separate comments from what comes before them
    comment = gensub(/[^!]*/,"",1)
    $0      = gensub(/!.*/,"",1)

    # ignore all .s inside double-quoted strings
    while ( match($0,/("[^"]*)\.([^"]*")/,a) ) {
        $0 = substr($0,1,RSTART-1) a[1] "@#" a[2] substr($0,RSTART+RLENGTH)
    }

    # ignore all .s inside single-quoted strings
    while ( match($0,/('[^']*)\.([^']*')/,a) ) {
        $0 = substr($0,1,RSTART-1) a[1] "@#" a[2] substr($0,RSTART+RLENGTH)
    }

    # convert all logical operators like a.or.b to a@#or@#b so the .s wont get replaced later
    while ( match($0,/\.([[:alpha:]]+)\./,a) ) {
        $0 = substr($0,1,RSTART-1) "@#" a[1] "@#" substr($0,RSTART+RLENGTH)
    }

    # convert all obj.var and similar to obj%var, etc.
    while ( match($0,/\<([[:alpha:]]+[[:alnum:]_]*)[.]([[:alpha:]]+[[:alnum:]_]*)\>/,a) ) {
        $0 = substr($0,1,RSTART-1) a[1] "%" a[2] substr($0,RSTART+RLENGTH)
    }

    # Convert all @#s in the precomment text back to .s
    gsub(/@#/,".")

    # Add the comment back
    $0 = $0 comment

    # Convert all @-s back to !s
    gsub(/@-/,"!")

    # Convert all @=s back to @s
    gsub(/@=/,"@")

    print
}

运行脚本及其输出:

$ awk -f tst.awk file
call obj%method()
obj%method( )
obj%method( arg1, arg2, kwarg1=kwarg1 )
obj1%var = obj2%var2
a = 1.0
b = 1.d0
if (a.or.b) then
    if ( a .and. .not.(obj%l1(1.d0)) ) then
!>I am a commented line.
   ! > I am.a commented line with..leading blanks and extra periods.1.
b=a1%var( 0.d0 ) !! I contain a commented version of this line: b=a1.var( 0.d0 )
c = "I am a string"
c= 'I am an obnoxious string: b=a1.var( 0.d0 ) ... '
c="I am an exclaimed string!"; b=a1%var()
相关问题