我有一个PHP应用程序,它与支付处理器连接以处理信用卡。有时,来自处理器的后响应失败(例如矩阵中的短暂故障),并且我们没有得到付款的自动通知。在这些情况下,我们会回退到始终发送的确认电子邮件中输入数据。我希望我的代码能够解析出电子邮件的文本以获取数据,这似乎是preg_match_all的完美用例。问题是电子邮件格式错误:它有name : value
对,但它们都在一行上,而且值通常是空白的,这让我很烦恼。
我对正则表达式基础知识(量词,分组,字符类,锚点,修饰符)非常满意,但实际上没有前瞻和后向引用的经验,对我来说,它们是否可以提供帮助并不是很明显。
示例数据可能看起来像这样(再次,这将全部在一行,只是为了便于阅读而包装):
bypass_first_page:x_company:x_cust_id:12345 x_customer_ip: x_customer_tax_id:x_description:98765 x_duty:x_email_customer: an_example@example.com x_fax:x_footer_email_receipt:x_fp_hash: 747ffeddfe4e106a9c67363ebff996ad x_fp_timestamp:1525100766 x_invoice_num:R000098765 x_login:MY-LOGIN-ID x_logo_url: x_merchant_email:x_method:x_phone:(416)555-1212 x_po_num: x_receipt_link_method:GET x_reference_3:1234 x_relay_response: TRUE x_relay_url:
我希望输出看起来像这样:
[
[bypass_first_page] =>
[x_company] =>
[x_cust_id] => 12345
[x_customer_ip] =>
[x_customer_tax_id] =>
[x_description] => 98765
[x_duty] =>
[x_email_customer] => an_example@example.com
[x_fax] =>
[x_footer_email_receipt] =>
[x_fp_hash] => 747ffeddfe4e106a9c67363ebff996ad
[x_fp_timestamp] => 1525100766
[x_invoice_num] => R000098765
[x_login] => MY-LOGIN-ID
[x_logo_url] =>
[x_merchant_email] =>
[x_method] =>
[x_phone] => (416) 555-1212
[x_po_num] =>
[x_receipt_link_method] => GET
[x_reference_3] => 1234
[x_relay_response] => TRUE
[x_relay_url] =>
]
需要注意的重要事项:
我最接近的是:
/([\w\d_]+) ?: ([^:]+)/
但这会产生如下输出:
[
[bypass_first_page] => x_company
[x_cust_id] => 12345 x_customer_ip
[x_customer_tax_id] => x_description
...
]
正如您从this regex101 link所看到的,这是失败的,因为冒号与任何东西都不匹配,并且字段名称最终会出现在值中(单独或与实际值连接)。我觉得如果有一个修饰符要求整个字符串匹配,或者锚点以某种方式表明一个匹配必须从前一个匹配开始,这可以很容易地解决这个问题,但我找不到任何提及这样的事情随处可见。可能只是因为我不知道那个叫什么东西?
答案 0 :(得分:4)
我发现的最简单的解决方案(到目前为止)是这样的:
(\w+) : ?(.*?)(?= ?\w+ :|$)
最后,按照Allen的建议在最后添加?
会使输出更加出色。
(\w+) : ?(.*?)(?= ?\w+ :|$) ?
输出:
[0] => Array
(
[0] => bypass_first_page :
[1] => x_company :
[2] => x_cust_id : 12345
[3] => x_customer_ip :
[4] => x_customer_tax_id :
[5] => x_description : 98765
[6] => x_duty :
[7] => x_email_customer : an_example@example.com
[8] => x_fax :
[9] => x_footer_email_receipt :
[10] => x_fp_hash : 747ffeddfe4e106a9c67363ebff996ad
[11] => x_fp_timestamp : 1525100766
[12] => x_invoice_num : R000098765
[13] => x_login : MY-LOGIN-ID
[14] => x_logo_url :
[15] => x_merchant_email :
[16] => x_method :
[17] => x_phone : (416) 555-1212
[18] => x_po_num :
[19] => x_receipt_link_method : GET
[20] => x_reference_3 : 1234
[21] => x_relay_response : TRUE
[22] => x_relay_url :
)
[1] => Array
(
[0] => bypass_first_page
[1] => x_company
[2] => x_cust_id
[3] => x_customer_ip
[4] => x_customer_tax_id
[5] => x_description
[6] => x_duty
[7] => x_email_customer
[8] => x_fax
[9] => x_footer_email_receipt
[10] => x_fp_hash
[11] => x_fp_timestamp
[12] => x_invoice_num
[13] => x_login
[14] => x_logo_url
[15] => x_merchant_email
[16] => x_method
[17] => x_phone
[18] => x_po_num
[19] => x_receipt_link_method
[20] => x_reference_3
[21] => x_relay_response
[22] => x_relay_url
)
[2] => Array
(
[0] =>
[1] =>
[2] => 12345
[3] =>
[4] =>
[5] => 98765
[6] =>
[7] => an_example@example.com
[8] =>
[9] =>
[10] => 747ffeddfe4e106a9c67363ebff996ad
[11] => 1525100766
[12] => R000098765
[13] => MY-LOGIN-ID
[14] =>
[15] =>
[16] =>
[17] => (416) 555-1212
[18] =>
[19] => GET
[20] => 1234
[21] => TRUE
[22] =>
)
我做了一些测试,认为这应该符合要求。
PS:我提出的第一个解决方案是this:
(?:^| )(\w+) : ?(?!\w+ : )(?:(.*?)(?= \w+ :|$))?
它有点冗长,但也可能对你有帮助。
答案 1 :(得分:1)
解决方案1:
我已按照以下方式调整了你的正则表达式:
(\w+|x_[^: ]*) ?:( ((?!x_|\()[^:() ]*|(?:(\d*[)( -])*\d+))?)? ?
它并不完美,但它可以在您的示例中正常工作,您可以看到: https://regex101.com/r/tTr4lG/2
请注意,它还有x_
起始限制。
解决方案2:检查链接:https://regex101.com/r/tTr4lG/3
已删除起始x_
限制!
(?<= |^)(([\w\d_]+) : ([A-Za-z0-9-]+(?= )|(\d*[)( -])*\d+|[A-Za-z0-9-_.]+@[A-Za-z0-9-_.]+\.[A-Za-z]+(?= ))?) ?
限制:仅接受电话号码的空格字符,并且仅在邮件地址中接受下划线。