Question

我需要提取文件中用单引号引起来的所有字符串。例如，如果文件包含以下行：

"Julius" was not "Ceaser"

它应该输出

Julius 
Ceaser

我想用bash（Sed / Awk）来做。使用Awk，我可以提取一个匹配项，但是如何获取所有字符串呢？

Answer 1

awk来营救！

$ awk -v RS='"' '!(NR%2)' file

Julius
Ceaser

使用此内容

$ cat file

我需要提取文件中用单引号引起来的所有字符串。例如，如果文件包含以下行：“ Julius” 不是“ Ceaser”，它应该输出Julius Ceaser

假定没有转义的引号。

Answer 2

grep -Eo '"[a-zA-Z]+"' file

将匹配的字符串打印在单独的行上，即使它们在原始文件的同一行上也是如此。如果要折叠火柴，可以这样做：

grep -nEo '"[a-zA-Z]+"' file | awk -F: '
BEGIN { p=1 }
      {
         gsub("\"", "", $2)
         n=$1;
         if (p != n) {
           print s; s = $2; p=n
         } else {
           if(s) { s = s" "$2 } else { s=$2 }
         }
      }
END   {
         print s
      }'

grep -nEo仅提取匹配的部分，并以行号作为前缀
awk解析grep的输出并产生所需的结果

Answer 3

如果要在同一行中打印所有用双引号引起来的字符串，请尝试使用此Perl单线版

perl -ne ' while(/("\S+")/g) { print "$1 " } print "\n" '

具有给定的输入

$ cat  doubleq.txt
"Julius" was not "Ceaser"
"request" map url
"Ceaser"


$ perl -ne ' while(/("\S+")/g) { print "$1 " } print "\n" ' doubleq.txt
"Julius" "Ceaser"
"request"
"Ceaser"

$

Answer 4

如果您不介意包含引号的输出，则可以使用简单的grep -o：

$ egrep -o '"[[:alnum:]]+"'  <<<'"Julius" was not "Ceaser"'
"Julius"
"Ceaser"

如果您不希望使用引号，grep -P（主要在Linux上）或pcregrep（FreeBSD，macOS和其他BSD）可能会起作用，请使用负向后看和前瞻：

$ pcregrep -o '(?<=")[[:alnum:]]+(?=")'  <<<'"Julius" was not "Ceaser"'
Julius 
Ceaser

如何获取文件中所有带引号的字符串？

4 个答案: