Question

我使用特殊的非标准语法编写了数千行代码。我需要能够使用不支持此语法的其他编译器来编译代码。我试图自动执行需要进行的更改，但是对正则表达式等的使用不是很好。我失败了。

这是我要实现的目标：当前，在我的代码中，使用以下可能的语法调用/访问对象的方法和变量：

call obj.method()
obj.method( )
obj.method( arg1, arg2, kwarg1=kwarg1 )
obj1.var = obj2.var2

相反，我希望它是：

call obj%method()
obj%method( )
obj%method( arg1, arg2, kwarg1=kwarg1 )
obj1%var = obj2%var2

我想进行这些更改而不会影响以下可能出现的“。” s：

小数：

a = 1.0
b = 1.d0

逻辑运算符（注意可能的空格和方法调用）：

if (a.or.b) then
    if ( a .and. .not.(obj.l1(1.d0)) ) then

任何有注释的内容（为此使用感叹号“！”）

!>I am a commented line.
   ! > I am.a commented line with..leading blanks and extra periods.1.
b=a1.var( 0.d0 ) !! I contain a commented version of this line: b=a1.var( 0.d0 )

任何用引号引起来（即字符串文字）

c = "I am a string"
c= 'I am an obnoxious string: b=a1.var( 0.d0 ) ... '

有人知道如何解决这个问题。我猜正则表达式是自然的方法，但是我对任何事物都持开放态度。（以防有人在乎：代码是用fortran编写的。ifort对“。”语法感到满意；对gfortran则不满意）

Answer 1

您是否考虑过使用flex解决问题？它使用正则表达式，但是更高级，因为它尝试使用不同的模式并返回最长的匹配选项。规则如下所示：

napoleon_use_param = True

您可能需要修改第三行。当前，如果没有从conf.py到%% /* rule part of the program */ !.*\n printf(yytext); /* ignore comments */ \".*\"|'.*' printf(yytext); /* ignore strings */ [^A-Za-z_][0-9]+\. printf(yytext); /* ignore numbers */ ".and."|".or."|".not." printf(yytext); /* ignore logical operators */ \. printf("%%"); /* now, replace the . by % */ [^\.] printf(yytext); /* ignore everything else */ %% /* invoke the program */ int main() { yylex(); }，从.到A的任何字符，它会忽略在任何位数之后出现的任何Z或数字前的字符a。如果标识符中还有更多合法字符，则可以添加它们。

如果一切正确，则应该可以将其转换为程序。将其复制到名为z的文件中并执行：

然后您有了C程序lex.l。您可以在命令行中使用它：

$ flex -o lex.yy.c lex.l
$ gcc -o lex.out lex.yy.c -lfl

这使用与Ed Mortons建议相同的原理，但是使用了flex，因此我们可以跳过组织。在某些情况下，例如在字符串中包含lex.out还是失败。

样本输入

cat unreplaced.txt | ./lex.out > replaced.txt

输出

\"

Answer 2

如果没有语言解析器，您将无法做到100％健壮（例如，如果您将\"放在双引号字符串中，以下操作将在某些情况下失败-易于处理，但只是许多可能的失败之一而未包括在内）您的用例），但这将处理您到目前为止向我们展示的内容。它将GNU awk用于gensub（），将第三个arg用于match（）。

示例输入：

$ cat file
call obj.method()
obj.method( )
obj.method( arg1, arg2, kwarg1=kwarg1 )
obj1.var = obj2.var2
a = 1.0
b = 1.d0
if (a.or.b) then
    if ( a .and. .not.(obj.l1(1.d0)) ) then
!>I am a commented line.
   ! > I am.a commented line with..leading blanks and extra periods.1.
b=a1.var( 0.d0 ) !! I contain a commented version of this line: b=a1.var( 0.d0 )
c = "I am a string"
c= 'I am an obnoxious string: b=a1.var( 0.d0 ) ... '
c="I am an exclaimed string!"; b=a1.var()

预期输出：

$ cat out
call obj%method()
obj%method( )
obj%method( arg1, arg2, kwarg1=kwarg1 )
obj1%var = obj2%var2
a = 1.0
b = 1.d0
if (a.or.b) then
    if ( a .and. .not.(obj%l1(1.d0)) ) then
!>I am a commented line.
   ! > I am.a commented line with..leading blanks and extra periods.1.
b=a1%var( 0.d0 ) !! I contain a commented version of this line: b=a1.var( 0.d0 )
c = "I am a string"
c= 'I am an obnoxious string: b=a1.var( 0.d0 ) ... '
c="I am an exclaimed string!"; b=a1%var()

脚本：

$ cat tst.awk
{
    # give us the ability to use @<any other char> strings as a
    # replacement/placeholder strings that cannot exist in the input.
    gsub(/@/,"@=")

    # ignore all !s inside double-quoted strings
    while ( match($0,/("[^"]*)!([^"]*")/,a) ) {
        $0 = substr($0,1,RSTART-1) a[1] "@-" a[2] substr($0,RSTART+RLENGTH)
    }

    # ignore all !s inside single-quoted strings
    while ( match($0,/('[^']*)!([^']*')/,a) ) {
        $0 = substr($0,1,RSTART-1) a[1] "@-" a[2] substr($0,RSTART+RLENGTH)
    }

    # Now we can separate comments from what comes before them
    comment = gensub(/[^!]*/,"",1)
    $0      = gensub(/!.*/,"",1)

    # ignore all .s inside double-quoted strings
    while ( match($0,/("[^"]*)\.([^"]*")/,a) ) {
        $0 = substr($0,1,RSTART-1) a[1] "@#" a[2] substr($0,RSTART+RLENGTH)
    }

    # ignore all .s inside single-quoted strings
    while ( match($0,/('[^']*)\.([^']*')/,a) ) {
        $0 = substr($0,1,RSTART-1) a[1] "@#" a[2] substr($0,RSTART+RLENGTH)
    }

    # convert all logical operators like a.or.b to a@#or@#b so the .s wont get replaced later
    while ( match($0,/\.([[:alpha:]]+)\./,a) ) {
        $0 = substr($0,1,RSTART-1) "@#" a[1] "@#" substr($0,RSTART+RLENGTH)
    }

    # convert all obj.var and similar to obj%var, etc.
    while ( match($0,/\<([[:alpha:]]+[[:alnum:]_]*)[.]([[:alpha:]]+[[:alnum:]_]*)\>/,a) ) {
        $0 = substr($0,1,RSTART-1) a[1] "%" a[2] substr($0,RSTART+RLENGTH)
    }

    # Convert all @#s in the precomment text back to .s
    gsub(/@#/,".")

    # Add the comment back
    $0 = $0 comment

    # Convert all @-s back to !s
    gsub(/@-/,"!")

    # Convert all @=s back to @s
    gsub(/@=/,"@")

    print
}

运行脚本及其输出：

$ awk -f tst.awk file
call obj%method()
obj%method( )
obj%method( arg1, arg2, kwarg1=kwarg1 )
obj1%var = obj2%var2
a = 1.0
b = 1.d0
if (a.or.b) then
    if ( a .and. .not.(obj%l1(1.d0)) ) then
!>I am a commented line.
   ! > I am.a commented line with..leading blanks and extra periods.1.
b=a1%var( 0.d0 ) !! I contain a commented version of this line: b=a1.var( 0.d0 )
c = "I am a string"
c= 'I am an obnoxious string: b=a1.var( 0.d0 ) ... '
c="I am an exclaimed string!"; b=a1%var()

解析挑战-修复语法错误

2 个答案:

样本输入

输出