我试图解析一个html表。这是它的代码:
<table border = "1">
<caption>
<h4>table</h4>
</caption>
<thead>
<tr>
<th></th>
<th colspan="3">1st header</th>
<th colspan="3">2nd header</th>
<th colspan="3">3rd header</th>
</tr>
<tr>
<th></th>
<th colspan="3">subhead1</th>
<th colspan="3">subhead2</th>
<th colspan="3">subhead3</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>text</td>
<td>more text</td>
<td>some more text</td>
<td>dog</td>
<td>bear</td>
<td>cat</td>
<td>toocan</td>
<td>inu</td>
<td>pes</td>
</tr>
</tbody>
</table>
我需要的是获得列的perl数据结构,但我无法弄清楚如何制作它:)。我能够做的是获得一个复杂的数据结构$ table,如下面的代码所示:
#!/usr/bin/perl
use HTML::TableExtract;
use Data::Dumper;
use strict;
my $content = 'table.html';
my $te = HTML::TableExtract->new();
$te->parse_file($content);
my ($table) = $te->tables;
我可以使用Data :: Dumper打印它但是如何正确使用它?我想得到这样的东西:
my %table = ( "first_header" => {
"subhead1" => [ 'text',
'more text',
'some more text'
],
"subhead2" => [ 'dog',
'bear',
'cat'
]
}
);