Question

我有一个由xgettext在我的c ++源代码中生成的.pot文件，格式为：

#: file1.cpp:line
#: file2.cpp:line
msgid "" - empty string

#: file1.cpp:line
#: file2.cpp:line
msgid " \t\n\r" - string contains only spaces

#: file1.cpp:line
#: file2.cpp:line
msgid "real text"

然后我使用如下命令：

grep "#: " "$(POT_FILE)" | sed -e 's/^\(#: \)\(.*)/\2'

使输出中只有文件名和行。

但问题是我不需要包含空格的字符串的文件。

它非常复杂，因为我必须找到msgid“”这一行，或者只是在＃：blablabla行的旁边，并根据字符串的内容绕过所有前面的行。

有人可以帮忙解决这个问题吗？

谢谢！

Answer 1

如果我理解正确，请将以下内容放入可执行文件中：

#!/usr/bin/awk -f

BEGIN { FS="\"" } # make it easier to test the text for msgid

# clean "file:line" line and store it in an array called "a"
/^#: / { sub(/^#: /, "", $0); a[i++]=$0 }

/^msgid/ {
    if( valid_msgid() ) { for( j in a ) print a[j] }
    reset() # clear array a after every msgid encountered
    }

function reset() {
    for( j in a ) { delete a[j]  }
    i = 0
    }

# put your validity tests here.
# $2 won't contain the entire string if the gettext contains double quotes
function valid_msgid() {
    if( length($2) > 0 && $2 !~ /^ / ) return 1
    return 0
    }

如果我将上述内容放入名为awko和chmod +x awko的文件中，然后运行awko data.pot，我会收到以下内容：

#: file1.cpp:line
#: file2.cpp:line

如果将“line”值转换为数字，则匹配上一节。

其中一个技巧是使用"作为分隔符。如果您需要拒绝msgid包含"的行，那么您将不得不使用更复杂的解析来识别完整的消息文本。

我无法访问xgettext，因此我不知道示例坏行中-之后的注释是来自您还是程序。 xgettext程序输出它们，分隔符可以更改为" -以在valid_msgid()中测试这些字符串。

修改gettext .pot文件输出以排除空字符串或仅包含空格的字符串

1 个答案: