我如何比较两个文本文件与PHP的匹配

时间:2013-10-01 04:30:54

标签: php arrays preg-match strpos

$domains = file('../../domains.txt');
$keywords = file('../../keywords.txt');

$域的格式为:

3kool4u.com,9/29/2013 12:00:00 AM,AUC
3liftdr.com,9/29/2013 12:00:00 AM,AUC
3lionmedia.com,9/29/2013 12:00:00 AM,AUC
3mdprod.com,9/29/2013 12:00:00 AM,AUC
3mdproductions.com,9/29/2013 12:00:00 AM,AUC

关键字的格式为:

keyword1
keyword2
keyword3

我想我真的想为一个文件中的关键字做一个数组,并搜索每一行domains.txt的匹配项。不知道从哪里开始,因为我对preg_match,preg_match_all和strpos的区别感到困惑,而在使用其中一个时更多或更少。

非常感谢您的帮助。

1 个答案:

答案 0 :(得分:3)

//EMPTY array to hold each line on domains that has a match
$matches = array();

//for each line on the domains file
foreach($domains as $domain){

    //for each keyword
    foreach($keywords as $keyword){

          //if the domain line contains the keyword on any position no matter the case
          if(preg_match("/$keyword/i", $domain)) {
                    //Add the domain line to the matches array
            $matches[] = $domain;
          }     
     }   
}

现在你有了$ matches数组,其中包含与关键字

匹配的域文件的所有行

请注意,使用以前的方法将两个整个文件加载到内存中,并依赖于文件大小,您可以运行内存或操作系统将开始使用比RAM大得多的交换

这是另一种更有效的方法,如果当时的文件会加载一行。

<?php

// Allow automatic detection of line endings
ini_set('auto_detect_line_endings',true);

//Array that will hold the lines that match
$matches = array();

//Opening the two files on read mode
$domains_handle = fopen('../../domains.txt', "r");
$keywords_handle = fopen('../../keywords.txt', "r");

    //Iterate the domains one line at the time
    while (($domains_line = fgets($domains_handle)) !== false) {

        //For each line on the domains file, iterate the kwywords file a line at the time
        while (($keywords_line = fgets($keywords_handle)) !== false) {

              //remove any whitespace or new line from the beginning or the end of string
              $trimmed_keyword = trim($keywords_line);

              //Check if the domain line contains the keyword on any position
              // using case insensitive comparison
              if(preg_match("/$trimmed_keyword/i", trim($domains_line))) {
                    //Add the domain line to the matches array
                $matches[] = $domains_line;
              } 
        }
        //Set the pointer to the beginning of the keywords file
        rewind($keywords_handle);
    }

//Release the resources
fclose($domains_handle);
fclose($keywords_handle);

var_dump($matches);