从某些字符后出现的字符串中提取某些文本

时间:2014-01-10 19:01:22

标签: php string

我试图从下面的字符串中提取某些信息。我尝试过使用explode(有效),但这是一个冗长的过程。我想知道是否有一种更简单,更合乎逻辑的方法吗?

字符串示例:

string(778) "Automatic reminders periodic maintenance schedule

You have a maintenance schedule
Vehicle:          357207058078957
Task:             Service NA61 HNB
Rule:             Every 10 mi or every 1 months after completion.
                     Task repeats when it is marked as completed
Last excuted:     
Scheduled for:    23/11/2013 or 50720 mi
Due:              since 45 d or in 50719 mi
Reminder:         22/11/2013 or 50715 mi

*** This is an automatically generated email, please do not reply. ***

If you have a question about our products and solutions, you can find your answer on our website under " frequently asked questions " or under "user guides ".

If you need to contact our Customer Support, please use our online contact form.

Kind regards"

我需要从此示例中提取以下信息。此信息并不总是相同,但它之前的文本(“车辆:”等)将会。

  • 车辆:357207058078957
  • 任务:服务NA61 HNB
  • 规则:完成后每10英里或每1个月。任务标记为已完成时重复
  • 最后被解雇:(这里应该有一个日期,但有时它可能是空白的。)
  • 预定于:2013年11月23日或50720英里
  • 到期:自45 d或50719 mi
  • 提醒:2013年11月22日或50715英里

(注意:这不是真实数据。)

除此之外,其余部分可以被忽视。

它们是否仍然具有第一位(例如车辆:)并没有什么区别,我可以自己轻松地移除该位。

2 个答案:

答案 0 :(得分:1)

我清理了一切,最后的RegEx是:

/Vehicle:(.*?)\nTask:(.*?)\nRule:(.*?)\nLast excuted:(.*?)\nScheduled for:(.*?)\nDue:(.*?)\nReminder:(.*?)\n\*/s

<强>故障:

()中的内容称为匹配组。因此,我们查找Vehicle:,然后将.(所有内容)与*匹配(0次以上)。这里需要?来进行这些懒惰的匹配,而不是贪婪的匹配。当.*遇到以下字符\nTask:时,它会停止运行。这一直持续到最后,我们将所有内容都追溯到尾随\*(转义*)。不要忘记末尾的/s修饰符,它允许.匹配所有包括换行符。

要在PHP中实现它,您可以执行以下操作:

<?php
$string = <<<EOT
Automatic reminders periodic maintenance schedule

You have a maintenance schedule
Vehicle:          357207058078957
Task:             Service NA61 HNB
Rule:             Every 10 mi or every 1 months after completion.
                     Task repeats when it is marked as completed
Last excuted:     
Scheduled for:    23/11/2013 or 50720 mi
Due:              since 45 d or in 50719 mi
Reminder:         22/11/2013 or 50715 mi

*** This is an automatically generated email, please do not reply. ***

If you have a question about our products and solutions, you can find your answer on our website under " frequently asked questions " or under "user guides ".

If you need to contact our Customer Support, please use our online contact form.

Kind regards
EOT;

if(preg_match('/Vehicle:(.*?)\nTask:(.*?)\nRule:(.*?)\nLast excuted:(.*?)\nScheduled for:(.*?)\nDue:(.*?)\nReminder:(.*?)\n\*/s', $string, $matches)) {
    unset($matches[0]); // $matches[0] contains the whole matches string

    // Update the keys to something more logical
    $keys = array('vehicle', 'task', 'rule', 'last_executed', 'scheduled_for', 'due', 'reminder');
    $data = array_combine($keys, $matches);

    // Trim the values, since we lazy selected in RegEx
    // Note: you may want to do something more complicated, since `rule` still has whitespace
    $data = array_map('trim', $data);

    print_r($data);
    // Array (
    //   [vehicle] => 357207058078957
    //   [task] => Service NA61 HNB
    //   [rule] => Every 10 mi or every 1 months after completion. Task repeats when it is marked as completed
    //   [last_executed] =>
    //   [scheduled_for] => 23/11/2013 or 50720 mi
    //   [due] => since 45 d or in 50719 mi
    //   [reminder] => 22/11/2013 or 50715 mi
    // )
}
?>

要了解详情,请阅读regular expressions上的preg_match()

答案 1 :(得分:0)

虽然Sam的解决方案更清洁,但我还是不相信Regexpessions lol ......

无论如何,我没有在这里看到回复,但是我继续输入了这个,所以我想我还是会发布它。

$msg ="Automatic reminders periodic maintenance schedule You have a maintenance schedule Vehicle: 357207058078957 Task: Service NA61 HNB Rule: Every 10 mi or every 1 months after completion. Task repeats when it is marked as completed Last excuted: Scheduled for: 23/11/2013 or 50720 mi Due: since 45 d or in 50719 mi Reminder: 22/11/2013 or 50715 mi *** This is an automatically generated email, please do not reply. *** If you have a question about our products and solutions, you can find your answer on our website under \" frequently asked questions \" or under \"user guides\". If you need to contact our Customer Support, please use our online contact form. Kind regards,";
$var = explode(' ',$msg);
$count = 0;
$array_count = 0;
$track_count = true;
$doc = array();
foreach ($var as $word){

    switch ($word) {
        case 'Vehicle:':

            $doc[$array_count] = $var[$count].' '.$var[$count+1];
            $array_count++;
            break;
        case 'Task:':
            $doc[$array_count] = $var[$count].' ';
            $count++;
            for ($i=0; $i <3 ; $i++) { 
                $doc[$array_count] .= $var[$count+$i].' ';
            }
            $array_count++;
            break;
        case 'Rule:':
            $while_count = 0; // run away
            $doc[$array_count] = $word.' ';
            while($var[$count].' '.$var[$count+1] != "Last excuted:"  && $while_count < 30){
                $doc[$array_count] .= $var[$count ].' ';
                $while_count++;
                $count++;
        }
        unset($while_count);
        $array_count++;
        $track_count = false;
        break;
    case 'Last':
        if($var[$count].' '.$var[$count+1] == 'Last excuted:'){
            $while_count = 0; // run away 
            $doc[$array_count] = $var[$count].' '.$var[$count+1];
            $count = $count+2; 
            while($var[$count].' '.$var[$count+1] != 'Scheduled for:'  && $while_count < 30){
                $doc[$array_count] .= $var[$count].' ';
                $while_count++;
                $count++;
            }
        unset($while_count);
        $array_count++;
        break;
        }
    case 'Scheduled':
        if($var[$count].' '.$var[$count+1] == 'Scheduled for:'){
            $while_count = 0; // run away 
            $doc[$array_count] = $word.' '.$var[$count+1].' ';
            $count = $count+2; 
            while($var[$count] != 'Due:'  && $while_count < 30){
                $doc[$array_count] .= $var[$count].' ';
                $while_count++;
                $count++;
            }
            unset($while_count);
        break;
        }
    case 'Due:':
        $while_count = 0; // run away 
        $doc[$array_count] = $word.' ';
        $count++;
        while($var[$count] != 'Reminder:'  && $while_count < 30){
            $doc[$array_count] .= $var[$count].' ';
            $while_count++;
            $count++;
        }
        unset($while_count);
        $array_count++;
        break;
    case 'Reminder:':
        $while_count = 0; // run away 
        $doc[$array_count] = $word.' ';
        $count++;
        while($var[$count] != '***'  && $while_count < 30){
            $doc[$array_count] .= $var[$count].' ';
            $while_count++;
            $count++;
        }
        unset($while_count);
        break;

}

if( $count < count($var)-1 && $track_count ){$count++;}

}
echo '<pre>';var_dump($doc);echo"</pre>";
/*
    array (size=6)
      0 => string 'Vehicle: 357207058078957' (length=24)
      1 => string 'Task: Service NA61 HNB ' (length=23)
      2 => string 'Rule: Every 10 mi or every 1 months after completion. Task repeats when it is marked as completed ' (length=98)
      3 => string 'Last excuted:' (length=13)
      4 => string 'Due: since 45 d or in 50719 mi ' (length=31)
      5 => string 'Reminder: 22/11/2013 or 50715 mi ' (length=33)
*/

很高兴看到你找到了解决方案