使用正则表达式查找特定术语

时间:2021-02-18 22:17:29

标签: python-3.x regex

我正在从 pdf 中提取文本,我想搜索像 P50+P60 这样的表达式,但是在文本中也有像 P50+P40+P30 这样的术语>. 我该如何实现,我只是找到了一个像 Pxx+Pxx(x=digit)这样的结构,但没有找到 Pxx+Pxx+Pxx

我是这样试的

List = re.findall('(P\d\d+P\d\d[^\+P\d\d])', String)

但这也显示了词条 P50+P40+P30 中的 P50+P40。 我尝试了很多,但无法解决问题。

1 个答案:

答案 0 :(得分:1)

使用

re.findall(r'(?<!P\d\d\+)P\d\d\+P\d\d(?!\+P\d\d)', String)

proof

说明

--------------------------------------------------------------------------------
  (?<!                     look behind to see if there is not:
--------------------------------------------------------------------------------
    P                        'P'
--------------------------------------------------------------------------------
    \d                       digits (0-9)
--------------------------------------------------------------------------------
    \d                       digits (0-9)
--------------------------------------------------------------------------------
    \+                       '+'
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  P                        'P'
--------------------------------------------------------------------------------
  \d                       digits (0-9)
--------------------------------------------------------------------------------
  \d                       digits (0-9)
--------------------------------------------------------------------------------
  \+                       '+'
--------------------------------------------------------------------------------
  P                        'P'
--------------------------------------------------------------------------------
  \d                       digits (0-9)
--------------------------------------------------------------------------------
  \d                       digits (0-9)
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    \+                       '+'
--------------------------------------------------------------------------------
    P                        'P'
--------------------------------------------------------------------------------
    \d                       digits (0-9)
--------------------------------------------------------------------------------
    \d                       digits (0-9)
--------------------------------------------------------------------------------
  )                        end of look-ahead
相关问题