Question

据我所知，如果我想用正则表达式拆分字符串，并将分隔符保留在Perl，JavsScript或PHP中，我应该使用捕获括号/组在正则表达式;例如在Perl中（我想在一个数字和右括号中分割）：

$ echo -e "123.123   1)  234.234\n345.345   0)  456.456" \
| perl -ne 'print join("--", split(/(\d\))/,$_));'
123.123   --1)--  234.234
345.345   --0)--  456.456

我在awk尝试相同的技巧，但它看起来不像它（因为，即使使用捕获组/括号，分隔符仍然被“吃掉”）：< / p>

$ echo -e "123.123   1)  234.234\n345.345   0)  456.456" \
| awk '{print; n=split($0,a,/([0-9]\))/);for(i=1;i<=n;i++){print i,a[i];}}'
123.123   1)  234.234
1 123.123   
2   234.234
345.345   0)  456.456
1 345.345   
2   456.456

awk是否可以强制将分隔符匹配保留在数组中，这是分裂的结果？

Answer 1

您可以在gawk中使用split()，例如

echo -e "123.123   1)  234.234\n345.345   0)  456.456" |
gawk '{
    nf = split($0, a, /[0-9]\)/, seps)
    for (i = 1; i < nf; ++i) printf "%s--%s--", a[i], seps[i]
    print a[i]
}'

输出：

123.123   --1)--  234.234
345.345   --0)--  456.456

GNU awk（gawk）中函数的版本接受另一个可选的数组名称参数，其中if present将匹配的分隔符保存到数组中。

如Gawk手册中所述：

split(s, a [, r [, seps] ])

Split the string s into the array a and the separators array seps on the regular expression r, and return the number of
fields.  If r is omitted, FS is used instead.  The arrays a and seps are cleared first.  seps[i] is the field separator
matched by r between a[i] and a[i+1].  If r is a single space, then leading whitespace in s goes into the extra array element
seps[0] and trailing whitespace goes into the extra array element seps[n], where n is the return value of split(s, a, r,
seps).  Splitting behaves identically to field splitting, described above.

Answer 2

正如@konsolebox所提到的，你可以使用split（）和更新的gawk版本来保存字段分隔符值。你也可以看看FPAT和patsplit（）。另一种方法是将RS设置为当前FS，然后使用RT。

话虽如此，我不明白为什么你想要一个涉及字段分隔符的解决方案，当你可以解决你在gawk中只用gensub（）发布的问题时：

$ echo -e "123.123   1)  234.234\n345.345   0)  456.456" |
gawk '{print gensub(/[[:digit:]])/,"--&--","")}'
123.123   --1)--  234.234
345.345   --0)--  456.456

如果您遇到的问题确实存在问题需要记住FS值，请告诉我们，我们可以为您指明方向。

正则表达式拆分字符串并在awk中保留分隔符

2 个答案: