Question

我需要你帮助使用bash / linux格式化一个txt文件。该文件如下所示，它总是有一行名为Rate：Sth，然后是非常特定格式的细节。我想为每个文件以一个速率拆分文件。在这个例子中，我想要有3个文件，每个文件都有相应的行表示Rate值是什么。

你将如何处理这个问题？

line No. Main Text
1    Rate: GBP
2    12/01/1999,90.5911501,Validated
     .....
     .....
210  18/01/1999,90.954996,Validated
211  Rate: RMB
212  24/04/2008,132.2542,Validated
     .....
1000 25/04/2008,132.2279,Validated
1001 28/04/2008,131.69915,Validated
1002 Rate: USD
1003 21/11/11,-0.004419534,Validated

Answer 1

这可能对您有用：

csplit -z -f 'temp' -b '%02d.txt' file /Rate/ {*}

这将生成文件temp00.txt，temp01.txt ...

如果您只想要Rate行;

sed -i '/Rate/!d' temp*.txt

Answer 2

我在perl中这样做：

#!/usr/bin/perl

use strict;
use warnings;

open (my $out, ">-") or die "oops";

while(<>)
{
    if (m/^Rate: (\w+)/o)
    {
        close $out and open ($out, ">$1") or die "oops";
        next;
    }

    print $out $_
}

像

一样使用它

perl ./test.pl input.txt

Answer 3

(g)awk救援：

awk '/^Rate:/ {output_file_name=$2; getline } 
     { print $0 >> ( output_file_name ) }' INPUT_FILE

第一条规则和命令对以Rate:开头的行执行，只设置输出文件名，然后从输入文件中获取下一行。然后处理下一行并将其写入输出文件。之后，下一行仅由第二个命令处理（写入输出文件），但前提是它与Rate:不匹配。

注意：如果输入文件中有一段连续的Rate:个连续行，则上述解决方案可能会失败，如下所示：

... DATA ...
Rate: GBP
Rate: CHF
... DATA ...

应该这样做（假设行号不是原始文件的一部分）。

HTH

Answer 4

一个单行的灵感来自sehe的回答：

>perl -pwe '
> if (/^Rate: (.+)/) { 
>    open $out, ">", "Rate_$1.txt" or die $!; 
>    select $out; 
> }' gasdata.txt

-p选项将读取一行并在评估-e中的代码后打印它。 select会为print选择默认文件句柄。所以，基本上，我们正在做的只是简单地处理文件句柄，具体取决于当前哪个速率是活动的。

以下是代码解析：

>perl -MO=Deparse -pwe 'if (/^Rate: (.+)/) { open $out, ">", "output/Rate_$1.txt" or die $!; select $out; }' gasdata.txt
BEGIN { $^W = 1; }
LINE: while (defined($_ = <ARGV>)) {
    if (/^Rate: (.+)/) {
        die $! unless open $out, '>', "output/Rate_$1.txt";
        select $out;
    }
}
continue {
    die "-p destination: $!\n" unless print $_;
}
-e syntax OK

Answer 5

另一个解决方案：它只是将您的输入文件转换为脚本然后运行它：

sed 's/^Rate:/cat <<EOF >/; 1!s/^cat <<EOF/EOF\n&/; $aEOF' input.txt | bash

我认为行号不是文件的一部分。

Answer 6

你可以在perl中使用这样的东西 -

Perl脚本：

#!/usr/bin/perl

undef $/;
$_ = <>;
$n = 0;

for $match (split(/(?=Rate)/)) {
      open(O, '>temp' . ++$n);
      print O $match;
      close(O);
}

<强>执行：

[jaypal~/temp]$ ./spl.pl temp.file

[jaypal~/temp]$ **cat temp.file**
Line No. Main Text
1    Rate: GBP
2    12/01/1999,90.5911501,Validated
     .....
     .....
210  18/01/1999,90.954996,Validated
211  Rate: RMB
212  24/04/2008,132.2542,Validated
     .....
1000 25/04/2008,132.2279,Validated
1001 28/04/2008,131.69915,Validated
1002 Rate: USD
1003 21/11/11,-0.004419534,Validated

[jaypal~/temp]$ cat temp1
Line No. Main Text
1    

[jaypal~/temp]$ cat temp2
Rate: GBP
2    12/01/1999,90.5911501,Validated
     .....
     .....
210  18/01/1999,90.954996,Validated

211  

[jaypal~/temp]$ cat temp3
Rate: RMB
212  24/04/2008,132.2542,Validated
     .....
1000 25/04/2008,132.2279,Validated
1001 28/04/2008,131.69915,Validated

1002 [jaypal~/temp]$ cat temp4
Rate: USD
1003 21/11/11,-0.004419534,Validated
[jaypal~/temp]$

根据文件内容和模式匹配拆分文件

6 个答案: