正则表达式不匹配字符串

时间:2018-02-03 03:41:54

标签: php regex

我正在尝试解析以下字符串,{}中的任何内容都需要保留在1个块中。其余的符号需要保留,但是在它们自己的数组键中。

$string_1 = ({Product.Depth+1.25=2.5&2.0+2.0=4}+{1.0+2.5=3.0&2.0+3.0=5.0})-(16.75+10.9375)
$string_2 = Product.Width+[{1.0+1.0=2.0|2.0+3.0=4}?100.00:0.00]
$string_3 = [1+1=2?10.00:Product.Depth]

到目前为止,我已经得到了前2名,但不是第3名。

preg_match_all("/[()=]|\\{[^\}]+\\}|[+-]|[^=]+$/", $string_to_parse, $matches);

现在它返回这样的东西......你可以看到它在键7,8,9之间缺少一些数字。关键14也减少了一些数字。

array(1) {
[0]=>
array(15) {
[0]=>
string(1) "="
[1]=>
string(1) "("
[2]=>
string(13) "{1+1=2&2+2=4}"
[3]=>
string(1) "+"
[4]=>
string(13) "{1+2=3&2+3=5}"
[5]=>
string(1) ")"
[6]=>
string(1) "="
[7]=>
string(1) "+"
[8]=>
string(1) "+"
[9]=>
string(1) "+"
[10]=>
string(13) "{1+1=2|2+3=4}"
[11]=>
string(1) "-"
[12]=>
string(1) "+"
[13]=>
string(1) "="
[14]=>
string(7) "2?10:0]"
}
}

我对正则表达式很恐怖,所以这超出了我的知识范围。我感谢任何帮助。

1 个答案:

答案 0 :(得分:1)

Pattern: ~\{[^}]*\}|\d+|.~ Pattern Demo

Code: (Demo)

$strings=[
    '({1+1=2&2+2=4}+{1+2=3&2+3=5})-(16+10)',
    '10+[{1+1=2|2+3=4}?100:0]',
    '[1+1=2?10:0]'
];
foreach($strings as $string){
    var_export(preg_match_all('~\{[^}]*\}|\d+|.~',$string,$out)?$out[0]:[]);
    echo "\n";
}

Output:

array (
  0 => '(',
  1 => '{1+1=2&2+2=4}',
  2 => '+',
  3 => '{1+2=3&2+3=5}',
  4 => ')',
  5 => '-',
  6 => '(',
  7 => '16',
  8 => '+',
  9 => '10',
  10 => ')',
)
array (
  0 => '10',
  1 => '+',
  2 => '[',
  3 => '{1+1=2|2+3=4}',
  4 => '?',
  5 => '100',
  6 => ':',
  7 => '0',
  8 => ']',
)
array (
  0 => '[',
  1 => '1',
  2 => '+',
  3 => '1',
  4 => '=',
  5 => '2',
  6 => '?',
  7 => '10',
  8 => ':',
  9 => '0',
  10 => ']',
)

As for your question extension criteria, the pattern just needs to be adjusted for letter-dot-letter sequences as well as float values.

preg_match_all() (Demo):

preg_match_all('~\{[^}]*\}|\d*\.?\d+|[a-z]+\.[a-z]+|.~i',$string,$out)?$out[0]:[]

or if you'd like to see preg_split() (Demo):

preg_split('~(\{[^}]*\}|[^\w.])~',$string,NULL,PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY)

*Note, if you want/need to identify signed numbers (positive/negative) by binding the leading sign with the number (but not match + and - operators), then additional adjustment is required. I'll not go down this rabbit hole, unless you explicitly state that this is a requirement for your actual project.

As for explaining these patterns, the formal explanation is automatically delivered whenever you write your input string and pattern into regex101.com (or the like).

Beyond that, I can offer some casual explanations:

~               #Pattern delimiter (There are many valid delimiters, this is a wise choice because the tilde is not used inside the actual pattern.  This avoids having to perform any unnecessary escaping.)
\{[^}]*\}       #Match (as much as possible) { followed by zero or more characters that are not } then match }
|               #Or
\d*\.?\d+       #Match (as much as possible) zero or more digits, followed by an optional dot, followed by one or more digits. (This allows "0.999" and ".1" but not "99." )
|               #Or
[a-z]+\.[a-z]+  #Match (as much as possible) one or more letters, followed by a dot, followed by one or more letters.
|               #Or
.               #Match any single non-newline character (this is intended to pick up all of the symbols/left-overs).
~               #Pattern Delimiter
i               #Case-insensitive pattern modifier: this makes the regex engine treat every [a-z] like [a-zA-Z]

...another deep breath...

preg_split() is a versatile version of explode(). The pattern tells it every instance where an explosion should occur.

~          #Pattern delimiter
(          #Start capture group
\{[^}]*\}  #Match (as much as possible) { followed by zero or more characters that are not } then match }
|          #Or
[^\w.]     #Match any single character that is not a letter, number, underscore, or dot (same effect as: "[a-zA-Z0-9_.]").  This is intended to "catch" all of the symbols that are meant to be singled-out.
)          #End capture group
~          #Pattern delimiter

In other words, this explodes on every curly bracketed expression or symbol. This alone doesn't work as required -- flags must be declared on this function call.

Parameter 3 is NULL this tells the preg_split() to match an unlimited number of times. This is the default behavior of the function, but for parameter 4 to work we need to use this placeholder.

Parameter 4 has two parts. Declaring more than one flag requires the use of a pipe | to separate them.

  • PREG_SPLIT_DELIM_CAPTURE : This tells the function to retain the substrings that are used as "points of explosion". With out this flag, the output array would not contain any of the curly bracketed expressions or symbols. If we weren't going to use this flag, then the capture group brackets would be needless in the pattern.
  • PREG_SPLIT_NO_EMPTY : When two "points of explosion" are side-by-side, the result is an empty array element. In many cases (and this case specifically), these empty elements are not desirable; this flag eliminates the need to call array_filter() to mop-up the mess.