PHP:从文本中提取特定数据并将其放入列表中

时间:2013-03-01 00:38:22

标签: php

我在编写从文本中提取某些数据所需的代码时遇到了麻烦。 我的文本结构与此类似:

[class name]. [class name]. (class units). [class description]. [class instructors]

例如:

  

200A-200B。民事诉讼。 (3)   根据守则制度和联邦规则提出的诉讼原则;现代审判实践,包括地点,程序,陪审团,证据充分,指示,判决,新审判,判决;上诉程序。   Aldave女士,Louisell先生,Poche先生,Stolz先生,Vetter先生

     

201A-201B。合同。 (4)   合同法,处理形成,操作和终止的问题。   艾森伯格先生,凯斯勒先生,劳贝先生,温特劳布先生

然后列表继续,还有更多这些。

我想分解这些列表的不同部分,并将它们放入每个列表的列表中。例如,我希望列表中的所有类编号,列表中的所有类名,列表中的所有单元,列表中的所有类描述以及列表中的所有教师。

我怎么能得到这个?我刚刚开始用php编写代码,我应该做什么推荐读物?谢谢。

2 个答案:

答案 0 :(得分:1)

这是否符合您的需求? (我没有使用点作为分隔符,而是使用#)

$strings = array();

$class_codes = array();
$class_names = array();
$class_units = array();
$class_descriptions = array();
$class_teachers = array();

$strings[] = "200A-200B#Civil Procedure#(3)#The principles of pleading under the code system and the federal rules; modern trial practice, including venue, process, the jury, sufficiency of evidence, instructions, verdicts, new trials, judgments; appellate procedure.#Ms. Aldave, Mr. Louisell, Mr. Poche, Mr. Stolz, Mr. Vetter";
$strings[] = "201A-201B#Contracts#(4)#The law of contracts, dealing with the problems of formation, operation, and termination.#Mr. Eisenberg, Mr. Kessler, Mr. Laube, Mr. Weintraub";

$total = count($strings);

for($i=0; $i<$total; $i++)
{
    $string_parts = explode("#", $strings[$i]);

    $class_codes[] = $string_parts[0];
    $class_names[] = $string_parts[1];
    $class_units[] = $string_parts[2];
    $class_descriptions[] = $string_parts[3];
    $class_teachers[] = $string_parts[4];
}

echo "<pre>";
print_r($class_codes);
echo "</pre>";

答案 1 :(得分:0)

您可以遍历文件的每一行并将regex应用于每一行以获取

//I am just constructing $lines array assuming you have all lines of the file 

$lines[0] = "200A-200B. Civil Procedure. (3) The principles of pleading under the code system and the federal rules;
modern trial practice, including venue, process, the jury, sufficiency of evidence, instructions, verdicts, new trials, judgments; appellate procedure. Ms. Aldave, Mr. Louisell, Mr. Poche, Mr. Stolz, Mr. Vetter";
$lines[1] = "201A-201B. Contracts. (4) The law of contracts, dealing with the prob¬lems of formation, operation, and termination. Mr. Eisenberg, Mr. Kessler, Mr. Laube, Mr. Weintraub";

$regex = '/(.*)\.\s*(.*)\.\s*\(([1-9]+)\)\s*([^\..]*)\.\s*(.*)\s*$/';
$data = array();
foreach($lines as $line)
{
    preg_match($regex, $line, $matches);
    if(isset($matches[1]) &&
       isset($matches[2]) &&
       isset($matches[3]) &&
       isset($matches[4]) &&
       isset($matches[5])
    )
    $data[] = array("class_code" => $matches[1],
                "class_name" => $matches[2],
                "class_unit" => $matches[3],
                "class_description" => $matches[4],
                "class_instructors" => $matches[5]
            );

}

如果var_dump高于$data变量,您将获得以下输出:

array
  0 => 
    array
      'class_code' => string '200A-200B' (length=9)
      'class_name' => string 'Civil Procedure' (length=15)
      'class_unit' => string '3' (length=1)
      'class_description' => string 'The principles of pleading under the code system and the federal rules;
modern trial practice, including venue, process, the jury, sufficiency of evidence, instructions, verdicts, new trials, judgments; appellate procedure' (length=222)
      'class_instructors' => string 'Ms. Aldave, Mr. Louisell, Mr. Poche, Mr. Stolz, Mr. Vetter' (length=58)
  1 => 
    array
      'class_code' => string '201A-201B' (length=9)
      'class_name' => string 'Contracts' (length=9)
      'class_unit' => string '4' (length=1)
      'class_description' => string 'The law of contracts, dealing with the prob¬lems of formation, operation, and termination' (length=90)
      'class_instructors' => string 'Mr. Eisenberg, Mr. Kessler, Mr. Laube, Mr. Weintraub' (length=52)

我希望这就是你要找的......