从python中的文本文件中提取某些值

时间:2018-07-09 05:45:19

标签: regex python-3.x

我有一个以下格式的文本文件,我必须提取所有范围的运动和位置值。在某些文件中,该值在下一行给出,而在某些文件中,则不给出

File1.txt:

Functional Assessment: Patient currently displays the following functional 
limitations and would benefit from treatment to maximize functional use and 
pain reduction: Range of Motion: limited . ADLs: limited . Gait: limited . 
Stairs: limited . Squatting: limited . Work participation status: limited . 
Current Status: The patient's current status is improving. 

Location: Right side 

预期输出:limited | Right side

File2.txt:

Functional Assessment: Patient currently displays the following functional 
limitations and would benefit from treatment to maximize functional use and 
pain reduction: 
Range of Motion: 
painful 
and
limited

Strength: 
limited 

预期输出:painful and limited |没有给出

这是我正在尝试的代码:

if "Functional Assessment:" in line:
    result=str(line.rsplit('Functional Assessment:'))
    romvalue = result.rsplit('Range of Motion:')[-1].split()[0]
    outputfile.write(romvalue)
    partofbody = result.rsplit('Location:')[-1].split()[0]
    outputfile.write(partofbody)

此代码无法获得所需的输出。有人可以帮忙吗。

1 个答案:

答案 0 :(得分:3)

您可以在以Functional Assessment:开头的行之后收集所有行,并加入它们并使用以下正则表达式:

(?sm)\b(Location|Range of Motion):\s*([^\W_].*?)\s*(?=(?:\.\s*)?[^\W\d_]+:|\Z)

请参见regex demo

详细信息

  • (?sm)-re.Sre.M修饰符
  • \b-单词边界
  • (Location|Range of Motion)-第1组:LocationRange of Motion
  • :\s*-一个冒号和0+个空格
  • ([^\W_].*?)-第2组:
  • \s*-超过0个空格
  • (?=(?:\.\s*)?[^\W\d_]+:|\Z)-当前位置右侧的正向前瞻
    • (?:\.\s*)?-.和0+空格的可选序列
    • [^\W\d_]+:-超过1个字母,后跟:
    • |-或
    • \Z-字符串的结尾。

这里是Python demo

reg = re.compile(r'\b(Location|Range of Motion):\s*([^\W_].*?)\s*(?=(?:\.\s*)?[^\W\d_]+:|\Z)', re.S | re.M)
for file in files:
    flag = False
    tmp = ""
    for line in file.splitlines():
        if line.startswith("Functional Assessment:"):
            tmp = tmp + line + "\n"
            flag = not flag
        elif flag:
            tmp = tmp + line + "\n"
    print(dict(list(reg.findall(tmp))))

输出(对于您发布的两个文本):

{'Location': 'Right side', 'Range of Motion': 'limited'}
{'Range of Motion': 'painful \nand\nlimited'}