无法使用perl成功提取段落

时间:2015-09-07 22:55:41

标签: perl

我正在尝试使用PERL从文本中提取段落。但是,代码不会生成我期望的结果。我从Zaid的帖子extracting paragraphs from text with perl中得到了很多答案。以下是我写的代码:

my $string = <<'TEXT';
     Assembly and Manufacturing

     The Company's assembly and manufacturing operations include PCB assembly
and the manufacture of subsystems and complete products. Its PCB assembly
activities primarily consist of the placement and attachment of electronic and
mechanical components on printed circuit boards using both SMT and traditional
pin-through-hole ("PTH") technology. The Company also assembles subsystems and
systems incorporating PCBs and complex electromechanical components, and,
increasingly, manufactures and packages final products for shipment directly to
the customer or its distribution channels. The Company employs just-in-time,
ship-to-stock and ship-to-line programs, continuous flow manufacturing, demand
flow processes and statistical process control. The Company has expanded the
number of production lines for finished product assembly, burn-in and test to
meet growing demand and increased customer requirements. In addition, the
Company has invested in FICO, a producer of injection molded plastic for Asia
electronics companies with facilities in Shenzhen, China.

     As OEMs seek to provide greater functionality in smaller products, they
increasingly require advanced manufacturing technologies and processes. Most of
the Company's PCB assembly involves the use of SMT, which is the leading
electronics assembly technique for more sophisticated products. SMT is a
computer-automated process which permits attachment of components directly on
both sides of a PCB. As a result, it allows higher integration of electronic
components, offering smaller size, lower cost and higher reliability than
traditional manufacturing processes. By allowing increasingly complex circuits
to be packaged with the components placed in closer proximity to each other, SMT
greatly enhances circuit processing speed, and therefore board and system
performance. The Company also provides traditional PTH electronics assembly
using PCBs and leaded components for lower cost products.;
TEXT

local $/ = "";
open my ($str_fh), '<', \$string;
while ( <$str_fh> ) {
     print "New Paragraph: $_\n","*" x 40, "\n" ;   
}
close $str_fh;

该文本来自该公司的年度报告https://www.sec.gov/Archives/edgar/data/32272/0000950147-97-000151.txt

我希望代码返回段落,但是,我得到了全文。 有人会帮我解决这个问题吗?我对这些错误很困惑。

非常感谢!!!

最好的问候

1 个答案:

答案 0 :(得分:2)

当我运行你在这里发布的代码时,它运行正常。它分别打印每个段落。

最有可能的是,段落之间的界限并非完全空白。如果“空白”行上有空格,则它们不计为段落分隔符。