根据第一个字符划分单词

时间:2016-05-21 14:36:41

标签: php regex

我有一个段落,我想根据第一个字符对每个单词进行分组,并按其他字符对该组进行排序。

示例文字

$text = "Why end might ask civil again spoil. She dinner she our horses depend. Remember at children by reserved to vicinity. In affronting unreserved delightful simplicity ye. Law own advantage furniture continual sweetness bed agreeable perpetual. Oh song well four only head busy it. Afford son she had lively living. Tastes lovers myself too formal season our valley boy. Lived it their their walls might to by young.";

第一句的预期结果 -

为什么最终可能会再次要求民事破坏

a => again, ask
c => civil
e => end
m => might
s => spoil
w => Why

2 个答案:

答案 0 :(得分:1)

有很多方法可以做到....我只选择了一个我觉得稍微有趣的方法(而不仅仅是#34; gimme脚本&#34 ;; - ))

<?php
// see http://docs.php.net/splheap
class StrcasecmpHeap extends SplHeap {
    protected function compare ($a,$b) { return strcasecmp($b,$a); }
}
$text = "Why end might ask civil again spoil. She dinner she our horses depend. Remember at children by reserved to vicinity. In affronting unreserved delightful simplicity ye. Law own advantage furniture continual sweetness bed agreeable perpetual. Oh song well four only head busy it. Afford son she had lively living. Tastes lovers myself too formal season our valley boy. Lived it their their walls might to by young.";

// create
$result = [];
// see http://docs.php.net/preg_split
foreach( preg_split('![^a-zA-Z]+!', $text, -1, PREG_SPLIT_NO_EMPTY) as $word ) {
    $char = strtolower($word[0]);
    if ( !isset($result[$char]) ) {
        $result[$char] = new StrcasecmpHeap;
    }
    $result[$char]->insert($word);
}

// print
foreach( $result as $char=>$list ) {
    echo "--- $char ---", PHP_EOL;
    foreach($list as $word ) {
        echo ' ', $word, PHP_EOL;
    }
}

这将保持双重像,例如

--- s ---
 赛季
 
 
 
 简单

<?php
$text = "Why end might ask civil again spoil. She dinner she our horses depend. Remember at children by reserved to vicinity. In affronting unreserved delightful simplicity ye. Law own advantage furniture continual sweetness bed agreeable perpetual. Oh song well four only head busy it. Afford son she had lively living. Tastes lovers myself too formal season our valley boy. Lived it their their walls might to by young.";

// build
$result = [];
foreach( preg_split('![^a-zA-Z]+!', $text, -1, PREG_SPLIT_NO_EMPTY) as $word ) {
    // here goes the case-sensitivity; it's all lower-case from now on....
    $word = strtolower($word);
    $char = $word[0];
    // not storing as the element's value but the key
    // takes care of doublets
    $result[$char][$word] = true;
}

// get keys & sort
$result = array_map(
    function($e) {
        // remember? The actual words have been stored as the keys
        $e = array_keys($e);
        usort($e, 'strcasecmp');
        return $e;
    },
    $result
);


// print
var_export($result);

答案 1 :(得分:0)

我的解决方案围绕正则表达式构建,该正则表达式将已排序的单词按首字母分成短语。

  • (\w):与任何字母匹配的捕获组(技术上任何“单词”字符),与单词中的第一个字母匹配,然后
  • .*?:尽可能少的字符数(可能来自一个字,或几个字),然后是
  • ($| (?!\\1)):文字的最后空格,后跟与初始捕获组相同的字母。
$text = "Why end might ask civil again spoil. She dinner she our horses"
    . " depend. Remember at children by reserved to vicinity. In affronting"
    . " unreserved delightful simplicity ye. Law own advantage furniture"
    . " continual sweetness bed agreeable perpetual. Oh song well four only"
    . " head busy it. Afford son she had lively living. Tastes lovers"
    . " myself too formal season our valley boy. Lived it their their walls"
    . " might to by young.";

// Split the text into individual words and sort them, case insensitively.
$words = preg_split("[\W+]", $text);
natcasesort($words);

// Join the sorted words back together and break them into phrases by
// initial letter.
preg_match_all("[(\w).*?($| (?!\\1))]i", implode(" ", $words), $matches);

// Arrange the phrases into an array keyed by lower-case initial letter,
// split them back into an array of words.
$words = array_combine(
    array_map("strtolower", $matches[1]),
    array_map(function($phrase){ return explode(" ", trim($phrase)); },
              $matches[0]));

var_dump($words);

/*
array (size=19)
  'a' => 
    array (size=7)
      0 => string 'advantage' (length=9)
      1 => string 'Afford' (length=6)
      2 => string 'affronting' (length=10)
      3 => string 'again' (length=5)
      4 => string 'agreeable' (length=9)
      5 => string 'ask' (length=3)
      6 => string 'at' (length=2)
  'b' => 
    array (size=5)
      0 => string 'bed' (length=3)
      1 => string 'boy' (length=3)
      2 => string 'busy' (length=4)
      3 => string 'by' (length=2)
      4 => string 'by' (length=2)
  ...
 */