使用Perl读取非分隔文本文件

时间:2016-02-25 22:58:55

标签: perl

我有一个标准输入的文件,但是我以前没有尝试过读入Perl程序。

文件格式为:

 Net Number Assignments


Number  xxx.xxx.xxx.xxx
Netmask in /## Form     30
Type    IP
Status  InUse
Description     mpirpd-cjdn
Notes   mgmt
Entry-Id        000000000026450
Submitter       John Doe
Create-date     2009-07-01-13:55:24
Contact-Data    INTERNAL/555-555-5555
Contact-Id      CON-000028508




Net Number Assignments


Number  xxx.xxx.xxx.xxx
Netmask in /## Form     32
Type    IP
Status  InUse
Description     switch Lo0 -- switch unnamed
Notes   Reverved for Lan Management Loop Backs and links
Entry-Id        000000000032710
Submitter       John Doe
Create-date     2015-11-25-10:59:27
Last-modified-by        John Doe
Modified-date   2015-11-25-11:30:06
Contact-Data    INTERNAL/555-555-5555
Contact-Id      CON-000028508




Net Number Assignments


Number  xxx.xxx.xxx.xxx
Netmask in /## Form     32
Type    IP
Status  InUse
Description     mplsfe9-hub
Area    mpls
Entry-Id        000000000024150
Submitter       Russ Reilly
Create-date     2007-05-02-18:26:20
Last-modified-by        John Doe
Modified-date   2013-05-06-19:09:37
Contact Name    ITG  INTERNAL
Contact Phone   555-555-5555
Contact E-mail  me@home.com

并非所有字段都被使用(例如:下一条记录中可能缺少联系人姓名和联系电话)。

我不一定需要字段标题,因为它们始终位于每个记录的相同位置。

我确信之前已经完成过,可能有一个简单的解决方案,所以在重新创建方向盘之前我会问这个问题。

2 个答案:

答案 0 :(得分:2)

我建议使用一系列哈希作为您所呈现文件的理想数据结构。

我们将input record separator设置为'',将两个或多个连续的空行视为一个空行。然后,在每个记录中,我们只需split每行两个或多个空格,这将保留包含空格的键。 split总共限制为2个字段,以防止为包含两个或更多连续空格的值(例如ITG INTERNAL)创建其他字段。

use strict;
use warnings;

use Data::Dump;

local $/ = '';
my @data;

while (<DATA>) {
    chomp;
    next if $_ eq 'Net Number Assignments';
    my %record;

    for my $line (split(/\n/)) {
        my ($key, $value) = split(/\s\s+/, $line, 2);
        $record{$key} = $value;
    }

    push(@data, \%record);
}

dd(\@data);

__DATA__
Net Number Assignments


Number  xxx.xxx.xxx.xxx
Netmask in /## Form     30
Type    IP
Status  InUse
Description     mpirpd-cjdn
Notes   mgmt
Entry-Id        000000000026450
Submitter       John Doe
Create-date     2009-07-01-13:55:24
Contact-Data    INTERNAL/555-555-5555
Contact-Id      CON-000028508




Net Number Assignments


Number  xxx.xxx.xxx.xxx
Netmask in /## Form     32
Type    IP
Status  InUse
Description     switch Lo0 -- switch unnamed
Notes   Reverved for Lan Management Loop Backs and links
Entry-Id        000000000032710
Submitter       John Doe
Create-date     2015-11-25-10:59:27
Last-modified-by        John Doe
Modified-date   2015-11-25-11:30:06
Contact-Data    INTERNAL/555-555-5555
Contact-Id      CON-000028508




Net Number Assignments


Number  xxx.xxx.xxx.xxx
Netmask in /## Form     32
Type    IP
Status  InUse
Description     mplsfe9-hub
Area    mpls
Entry-Id        000000000024150
Submitter       Russ Reilly
Create-date     2007-05-02-18:26:20
Last-modified-by        John Doe
Modified-date   2013-05-06-19:09:37
Contact Name    ITG  INTERNAL
Contact Phone   555-555-5555
Contact E-mail  me@home.com

输出:

[
  {
    "Contact-Data"        => "INTERNAL/555-555-5555",
    "Contact-Id"          => "CON-000028508",
    "Create-date"         => "2009-07-01-13:55:24",
    "Description"         => "mpirpd-cjdn",
    "Entry-Id"            => "000000000026450",
    "Netmask in /## Form" => 30,
    "Notes"               => "mgmt",
    "Number"              => "xxx.xxx.xxx.xxx",
    "Status"              => "InUse",
    "Submitter"           => "John Doe",
    "Type"                => "IP",
  },
  {
    "Contact-Data"        => "INTERNAL/555-555-5555",
    "Contact-Id"          => "CON-000028508",
    "Create-date"         => "2015-11-25-10:59:27",
    "Description"         => "switch Lo0 -- switch unnamed",
    "Entry-Id"            => "000000000032710",
    "Last-modified-by"    => "John Doe",
    "Modified-date"       => "2015-11-25-11:30:06",
    "Netmask in /## Form" => 32,
    "Notes"               => "Reverved for Lan Management Loop Backs and links",
    "Number"              => "xxx.xxx.xxx.xxx",
    "Status"              => "InUse",
    "Submitter"           => "John Doe",
    "Type"                => "IP",
  },
  {
    "Area"                => "mpls",
    "Contact E-mail"      => "me\@home.com",
    "Contact Name"        => "ITG  INTERNAL",
    "Contact Phone"       => "555-555-5555",
    "Create-date"         => "2007-05-02-18:26:20",
    "Description"         => "mplsfe9-hub",
    "Entry-Id"            => "000000000024150",
    "Last-modified-by"    => "John Doe",
    "Modified-date"       => "2013-05-06-19:09:37",
    "Netmask in /## Form" => 32,
    "Number"              => "xxx.xxx.xxx.xxx",
    "Status"              => "InUse",
    "Submitter"           => "Russ Reilly",
    "Type"                => "IP",
  },
]

答案 1 :(得分:1)

这在概念上很简单但有点单调乏味。这种类型的解析解决方案的规范版本如下所示:

#!/usr/bin/perl
my $all = {};  # A hash to hold all number entries indexed by IP
my $cur = {};  # A hash to hold the current entry we are parsing
while(<>)
{
    chomp;
    if (my ($ip) = /^Number\s+(.*)/)
    {
        # If we have a current entry, save it in the $all hash
        $all->{$cur->{number}} = $cur if ($cur->{number});

        $cur = {};
        $cur->{number} = $ip;
    }
    elsif (my ($mask) = /^Netmask in \/## Form\s+(\d+)/)
    {
        $cur->{mask} = $mask;
    }
    elsif ... # Handle remaining input line types, saving what you want in $cur
}
# This is to save the last entry
$all->{$cur->{number}} = $cur if ($cur->{number});

# Your code to process the accumulated entries
...