正则表达式截断最接近的表达式/空格上的字符串,长度为

时间:2017-02-25 11:51:09

标签: php regex pcre

我想得到这个结果(来自 - >到)

# use string length limit = 3
1 {2 3}       -> 1 # the string between the {} must be whole
1 2 3         -> 1 2
1 23          -> 1
{1}           -> {1} 
{1 2}         -> empty 
123456        -> 123 # if there is no spaces, cut string by symbols (except {*} expressions). Not necessarily but it would be cool

# one more example. Use string length limit = 5
{1} 2           -> {1} 2
123 45          -> 123
123 4           -> 123 4

使用带有一个正则表达式的PHP有没有办法做到这一点?

长度限制可能是动态的。

类似的问题 - Get first 100 characters from string, respecting full words(但我的问题需要完全包含{*}表达式)

我试过了:^(.{1,3})({.*}|\s|$)

3 个答案:

答案 0 :(得分:1)

使用具有特定正则表达式模式的preg_match_all函数的解决方案:

$str = '1 {2 3}  
1 2 3  
1 23 
{1}   
{1 2} 
123456 ';

$re = '/^(\S \S{1}(?=\s)|\S(?= \S{2})|\{\S\}|\w{3}(?=\w))/m';
preg_match_all($re, $str, $matches);

// the new line containing truncated items(you can `implode` it to get a single string)
print_r($matches[0]);

输出:

Array
(
    [0] => 1
    [1] => 1 2
    [2] => 1
    [3] => {1}
    [4] => 123
)

Regex demo(选中右侧的“说明”部分)

答案 1 :(得分:1)

这里的想法是定义你的原子位,匹配每个,并使用负向lookbehind来限制字符长度(同时也确保拖延尾随空格 - 不确定是否需要这个,但是我想到了#&# 39;扔掉它。)

唯一的另一件事是使用条件表达式来查看它是否只是一个不间断的字符串系列,如果是这样就将它分开(对于你的123456 -> 123示例。)

function truncate($string, $length)
{
    $regex = <<<REGEX
        /
        (?(DEFINE)
            (?<chars> [^\s{}]+ )
            (?<group> { (?&atom)* } )
            (?<atom> (?&chars) | (?&group) | \s )
        )
        \A
        (?(?=.*[\s{}])
            (?&atom)*(?<! \s | .{{$length}}. ) |
            .{0,$length}
        )
        /x
REGEX;

    preg_match($regex, $string, $matches);
    return $matches[0];
}

$samples = <<<'DATA'
1 {2 3}
1 2 3
1 23
{1} 
{1 2} 
123456
DATA;

foreach (explode("\n", $samples) as $sample) {
    var_dump(truncate($sample, 3));
}

输出:

string(1) "1"
string(3) "1 2"
string(1) "1"
string(3) "{1}"
string(0) ""
string(3) "123"

$samples = <<<'DATA'
{1} 2
123 45
123 4
DATA;

foreach (explode("\n", $samples) as $sample) {
    var_dump(truncate($sample, 5));
}

输出:

string(5) "{1} 2"
string(3) "123"
string(5) "123 4"

答案 2 :(得分:0)

试试这个:

/^([\w ]{1,3}(?= )|\w{1,3}|\{\w\})/gm

它正在使用给定的样本https://regex101.com/r/iF2tSp/3

1 {2 3}
1 2 3
1 23
{1}
{1 2}
123456

Match 1
Full match  0-1 `1`
Group 1.    n/a `1`
Match 2
Full match  8-11    `1 2`
Group 1.    n/a `1 2`
Match 3
Full match  14-15   `1`
Group 1.    n/a `1`
Match 4
Full match  19-22   `{1}`
Group 1.    n/a `{1}`
Match 5
Full match  29-32   `123`
Group 1.    n/a `123`
相关问题