拆分每10个字符串

时间:2015-09-09 10:49:20

标签: php regex split

我想每10个单词拆分一个字符串。 但是,如果任何单词包含标点字符,请在标点字符后拆分并继续分割每10个单词。

我正在使用它,但它只是每10个字的分裂字符串。

<?php

$string = 'Lorem ipsum dolor sit amet, te has omnesque gubergren definiebas. Omnesque ullamcorper pri ut. In eos insolens atomorum moderatius, mundi menandri usu cu. Nam an dicant tritani philosophia facete minimum id sed errem omnium persequeris ad his, omnes luptatum recteque mel eu, est te laudem causae.';

$splitted = preg_replace( '~((?:\S*?\s){10})~', "$1\n", $string);

$Words = explode("\n", $splitted);

var_dump($Words);

?>
array(5) {
  [0]=>
  string(66) "Lorem ipsum dolor sit amet, te has omnesque gubergren definiebas. "
  [1]=>
  string(72) "Omnesque ullamcorper pri ut. In eos insolens atomorum moderatius, mundi "
  [2]=>
  string(66) "menandri usu cu. Nam an dicant tritani philosophia facete minimum "
  [3]=>
  string(64) "id sed errem omnium persequeris ad his, omnes luptatum recteque "
  [4]=>
  string(29) "mel eu, est te laudem causae."
}

我想获得这些结果,每10个单词拆分字符串但是如果任何单词包含标点符号,则在标点符号后分割继续每10个单词拆分

array(6) {
  [0]=>
  string() "Lorem ipsum dolor sit amet, te has omnesque gubergren definiebas. "
  [1]=>
  string() "Omnesque ullamcorper pri ut."
  [2]=>
  string() "In eos insolens atomorum moderatius, mundi  menandri usu cu."
  [3]=>
  string() "Nam an dicant tritani philosophia facete minimum id sed errem"
  [4]=>
  string() "omnium persequeris ad his, omnes luptatum recteque mel eu, est"
  [5]=>
  string() "te laudem causae."
}

1 个答案:

答案 0 :(得分:2)

这就是你想要的吗?

$string = 'Lorem ipsum dolor sit amet, te has omnesque gubergren definiebas. Omnesque ullamcorper pri ut. In eos insolens atomorum moderatius, mundi menandri usu cu. Nam an dicant tritani philosophia facete minimum id sed errem omnium persequeris ad his, omnes luptatum recteque mel eu, est te laudem causae.';
$splitted = preg_replace( '~((?:[^\s\pP]+[\s\pP]){1,10})~', "$1\n", $string);
$Words = explode("\n", $splitted);
var_dump($Words);

\pP代表任何标点字符。

<强>输出:

array(10) {
  [0]=>
  string(27) "Lorem ipsum dolor sit amet,"
  [1]=>
  string(38) " te has omnesque gubergren definiebas."
  [2]=>
  string(29) " Omnesque ullamcorper pri ut."
  [3]=>
  string(37) " In eos insolens atomorum moderatius,"
  [4]=>
  string(23) " mundi menandri usu cu."
  [5]=>
  string(63) " Nam an dicant tritani philosophia facete minimum id sed errem "
  [6]=>
  string(26) "omnium persequeris ad his,"
  [7]=>
  string(32) " omnes luptatum recteque mel eu,"
  [8]=>
  string(22) " est te laudem causae."
  [9]=>
  string(0) ""
}

如果您不想在逗号上拆分,请使用以下命令:

$splitted = preg_replace( '~((?:[^\s.:;]+[\s.:;]){1,10})~', "$1\n", $string);