我对编码非常陌生。
我的文本文件中的数据格式如下:
#
PROPERTY_A: TEXT1
PROPERTY_B: UNIT1
#
PROPERTY_A: TEXT2
PROPERTY_B: UNIT2
#
#
1 2
3 4
我想像这样将其输出为表格:
TEXT1 TEXT2
UNIT1 UNIT2
1 2
3 4
我了解如何将文本文件读取为行数组,然后如何使用split()将每一行解析为字符串数组。我想将数据写入具有属性(作为每一列的标题)的表中,因此我将需要使用“:”进行拆分,直到读取带有散列的2条连续行,然后更改为使用“”进行拆分。
使用此代码给我一个无限循环,即使通过两个单独的分割都可以正常工作。
my $dataAsText = SomeFunction->Run($imputDocument);
for (my $ln = 0; < $dataAsText->Lines->Count; ++$ln;)
my $line = $dataAsText->Lines($ln)
do {
my @words = split ($line, ‘: ‘, 2);
# then pass @words[1] to the first or second row of each column
} until ($line eq ‘#’ && $line + 1 eq ‘#’);
my @words = split ($line, ‘ ‘);
# then pass each @words values to its corresponding column
}
我该如何编写一段代码来检查2个连续的带有哈希符号的行,然后更改将行发送到数组之前的拆分方式?
仅说明最终数据文档可能有数十万行要读取,这是示例结构。
答案 0 :(得分:1)
您可以尝试使用命令行Perl
describe 'Credentials' do
it 'stubs credentials' do
allow(Rails.application.credentials).to receive(:my_token).and_return('123')
expect(Rails.application.credentials.my_token).to eq('123')
end
end
使用您给定的输入:
perl -F: -ane ' if(not /^\d+/) { $x.=$F[1] if not /^#/ } else { $y.=$_ }
END { $x=~s/\s*(\S+)\s*(\S+)\s*(\S+)\s*(\S+)/$1 $3\n$2 $4/gs; print $x,$y }' file
答案 1 :(得分:0)
类似的事情应该做。您要做的是检测以哈希开头的行并计数;当您连续看到两个时,请交换用于拆分的模式。
{
# Inside a block so only the sub here can see them.
my $double_hash_seen;
my $splitter = qr/:/; # First split pattern
my $last_was_hash;
sub split_appropriately {
my($line) = @_;
if (/^#/) {
if ($last_was_hash) {
# Double hash, switch modes.
$splitter = qr/ /; # Second split pattern
return; # We don't split hash lines
}
# Last was not a hash, but this was.
$last_was_hash = 1; # One hash seen.
return; # Don't split the hash line
}
# This isn't a hash. Turn off "hash seen" and split
# with appropriate current pattern.
$last_was_hash = 0;
split $splitter, $line; # Will stay switched once it changes
}
}
while(<DATA>) {
my($first, $second) = split_appropriately($_);
next unless defined $first;
print "Part 1: $first\n";
print "Part 2: $second\n";
}
__DATA__
#
PROPERTY_A: TEXT1
PROPERTY_B: UNIT1
#
PROPERTY_A: TEXT2
PROPERTY_B: UNIT2
#
#
1 2
3 4