PHP - 从文件导出名称和电子邮件地址

时间:2018-03-09 16:27:31

标签: php regex

我有一个包含人员,电话号码,电子邮件地址列表的文件

例如

库塔
莎莉库特哈德
地点:萨里
涵盖的专业知识:马,狗,马和骑手
网站:www.veterinaryphysio.co.uk
电话:07865095005
电子邮件:sally@veterinaryphysio.co.uk

凯特海恩斯 地点:萨里,苏塞克斯,肯特
涵盖的专业知识:马,表演,马和骑手
电话:07957 344688
电子邮件:katehaynesphysio@yahoo.co.uk

列表如上所述,有数百个,我如何创建一个从上到下读取文件的正则表达式,并提取名字和姓氏行以及电子邮件地址,并将它们放在一起,如下所示

姓名,电子邮件地址

任何帮助都很棒

我有以下代码,但只读取电子邮件地址

$string = file_get_contents("physio.txt"); // Load text file contents

// don't need to preassign $matches, it's created dynamically

// this regex handles more email address formats like a+b@google.com.sg, and the i makes it case insensitive
$pattern = '/[a-z0-9_\-\+]+@[a-z0-9\-]+\.([a-z]{2,3})(?:\.[a-z]{2})?/i';

// preg_match_all returns an associative array
preg_match_all($pattern, $string, $matches);

// the data you want is in $matches[0], dump it with var_export() to see it
echo "<pre>";
$input = $matches[0];
echo count($input);
echo "<br>";
$result = array_unique($input);
echo count($result);
echo "<br>";
//print_r($result);
echo "</pre>";

2 个答案:

答案 0 :(得分:1)

正则表达式似乎是一种解析这些数据的明智方法。重要的是要放入足够的组​​件以保持匹配准确。

我建议如下:

模式:~^(.+)\RLocation:[\s\S]*?^Email: (\S*)~mDemo

附近的子字符串Location:Email:用于确保定位正确的子字符串。

m模式修饰符用于通过匹配行开头的^字符(而不仅仅是字符串的开头)来提高模式的准确性。

细分:

~          #pattern delimiter
^          #match start of a line
(.+)       #capture one or more non-newline characters (Capture Group #1)
\R         #match a newline character (\r, \n, \r\n)
Location:  #match literal: "Location" followed by colon
[\s\S]*?   #match (lazily) zero or more of any character
^Email:    #match start of a line, literal: "Email", colon, space
(\S*)      #capture zero or more visible characters (Capture Group #2 -- quantifier means the email value can be blank and still valid)
~          #pattern delimiter
m          #pattern modifier tells regex engine that ^ means start of a line instead of start of the string

代码:(Demo

$input = "Coulthard
Sally Coulthard
Location: Surrey
Expertise Covered: Horse, Dog, Horse and Rider
Website: www.veterinaryphysio.co.uk
Tel: 07865095005
Email: sally@veterinaryphysio.co.uk

Kate Haynes
Location: Surrey, Sussex, Kent
Expertise Covered: Horse, Performance, Horse and Rider
Tel: 07957 344688
Email: katehaynesphysio@yahoo.co.uk";

if (preg_match_all("~^(.+)\RLocation:[\s\S]*?^Email: (\S*)~m", $input, $matches, PREG_SET_ORDER)) {
    foreach ($matches as $data) {
        echo "{$data[1]}, {$data[2]}\n";
    }
}

输出:

Sally Coulthard, sally@veterinaryphysio.co.uk
Kate Haynes, katehaynesphysio@yahoo.co.uk

答案 1 :(得分:0)

您可以通过双重换行拆分内容,然后处理每个块。要获取名字和姓氏,您可以获得不包含": "的最后一行:

$blocks = explode("\n\n", $string);
foreach ($blocks as $block) {
    $lines = explode("\n", $block);
    $mail = end($lines);
    $mail = substr($mail, strlen('Email: '));
    $lines = array_reverse($lines);
    $fnln = '';
    foreach ($lines as $line) {
        if (strpos($line, ': ') === false) {
            $fnln = $line;
            break;
        }
    }
    echo $fnln . ", " . $mail . "<br>";
}

输出:

Sally Coulthard, sally@veterinaryphysio.co.uk
Kate Haynes, katehaynesphysio@yahoo.co.uk

或者,如果电子邮件并不总是块的最后一行;

$blocks = explode("\n\n", $string);
foreach ($blocks as $block) {
    $lines = explode("\n", $block);
    $lines = array_reverse($lines);
    $fnln = '';
    foreach ($lines as $line) {
        if (substr($line, 0, 6) == 'Email:') {
            $mail = substr($line, 7);
        }
        if (strpos($line, ': ') === false) {
            $fnln = $line;
            break;
        }
    }
    echo $fnln . ", " . $mail . "<br>";
}