正则表达式 - 在两个字符串之间获取文本

时间:2015-11-27 05:50:18

标签: java regex

我有一个大文本文件,其中包含许多摘要(其中7k)。我想把它们分开。它们具有以下属性:

在开始时的一个数字

  

123

它始终以:

结束
  

[PubMed - 为MEDLINE索引]

如果我能从分离的字符串中获取标题和摘要,那就更好了。如果我必须首先拆分文章然后拆分文本,我就没事了。

在示例中,标题是第三行:

Effects of propofol and isoflurane on haemodynamics and the inflammatory response in cardiopulmonary bypass surgery.

摘要在第8行:

Cardiopulmonary bypass (CPB) causes reperfusion injury...

我尝试在本文中使用以下代码

正则表达式:

[0-9\.]*\s*(((?![0-9\.]*|MEDLINE).)+)\s*MEDLINE

文本:

1. Br J Biomed Sci. 2015;72(3):93-101.

Effects of propofol and isoflurane on haemodynamics and the inflammatory response
in cardiopulmonary bypass surgery.

Sayed S, Idriss NK, Sayyedf HG, Ashry AA, Rafatt DM, Mohamed AO, Blann AD.

Cardiopulmonary bypass (CPB) causes reperfusion injury that when most severe is
clinically manifested as a systemic inflammatory response syndrome. The
anaesthetic propofol may have anti-inflammatory properties that may reduce such a
response. We hypothesised differing effects of propofol and isoflurane on
inflammatory markers in patients having CBR Forty patients undergoing elective
CPB were randomised to receive either propofol or isoflurane for maintenance of
anaesthesia. CRP, IL-6, IL-8, HIF-1α (ELISA), CD11 and CD18 expression (flow
cytometry), and haemoxygenase (HO-1) promoter polymorphisms (PCR/electrophoresis)
were measured before anaesthetic induction, 4 hours post-CPB, and 24 hours later.
There were no differences in the 4 hours changes in CRP, IL-6, IL-8 or CD18
between the two groups, but those in the propofol group had higher HIF-1α (P =
0.016) and lower CD11 expression (P = 0.026). After 24 hours, compared to the
isoflurane group, the propofol group had significantly lower levels of CRP (P <
0.001), IL-6 (P < 0.001) and IL-8 (P < 0.001), with higher levels CD11 (P =
0.009) and CD18 (P = 0.002) expression. After 24 hours, patients on propofol had 
increased expression of shorter HO-1 GT(n) repeats than patients on isoflurane (P
= 0.001). Use of propofol in CPB is associated with a less adverse inflammatory
profile than is isofluorane, and an increased up-regulation of HO-1. This
supports the hypothesis that propofol has anti-inflammatory activity.

PMID: 26510263  [PubMed - indexed for MEDLINE]

2 个答案:

答案 0 :(得分:1)

试试这个:

"^[0-9]+\..*\s+(.*)\s+.*\s+((?:\s|.)*?)\[PubMed - indexed for MEDLINE\]"

第一组是冠军。第二个是抽象的。

答案 1 :(得分:1)

Marianostribizhev提出了两个有用的解决方案:

Mariano的解决方案:使用split方法和典型结束

(?m)\[PubMed - indexed for MEDLINE\]$

DEMO:http://ideone.com/Qw5ss2

Java 4 +

stribizhev的解决方案:从文本中完全提取数据

(?m)^\s*\d+\..*\R{2}                 # Get to the title
(?<title>[^\n]*(?:\n(?!\n)[^\n]*)*)  # Get title
\R{2}                                # Get to the authors
[^\n]*(?:\n(?!\R)[^\R]*)*            # Consume authors
(?<abstract>[^\[]*(?:\[(?!PubMed[ ]-[ ]indexed[ ]for[ ]MEDLINE\])[^\[]*)*) #Grab abstract

DEMO:https://regex101.com/r/sG2yQ2/2

Java 8 +

相关问题