处理非结构化医疗的工具/方法将文本数据处理为CSV

时间:2016-02-26 11:36:18

标签: data-analysis text-processing bigdata

10/03/2014 16:55  Local Title: TRANSFER OUT NOTE
            Standard Title: TRANSFER SUMMARIZATION NOTE
                 AUTHOR:  D,WARD

                      XYZ MEDICAL INSTITUTE 
                 ABC NAGAR, PQW CITY-101011
 ******************************************************************
                       TRANSFER OUT NOTE
                      *******************          OCT 03, 2014

 UHID:000-01-0202   PATIENT NAME:        NAME , SINGH 
 AGE/SEX:42/FEMALE

 DOA:Sep 30,2014

 DEPARTMENT:GYNAE AND OBSTETRICS  UNIT:II

 TRANSFERRED FROM:D3

 NAME , SINGH       000-01-0202                          DOB: 01/01/1972





TRANSFERRED TO : MCU

DIAGNOSIS:pop- em lscs with male baby nicu B


TREATMENT:
inj.cefazolin 1 gm bd
inj.rantac 1 amp tds
inj.perinorm 1 amp tds
inj.pcm 1 gm tds 
inj.texid 1 gm tds


PATIENT STATUS AT THE TIME OF SHIFTING:
  g.c. fair on iv fluid .. 


NAME , SINGH        000-01-0202                          DOB: 01/01/1972




VITALS AT THE TIME OF SHIFTING:
TEMP:98.6F

HR:88/MIN RR:24/MIN

GCS: E V M 


                   <  THE ABOVE NOTE IS UNSIGNED  >                      
- DRAFT COPY * DRAFT COPY * DRAFT COPY * DRAFT COPY * DRAFT COPY * DRAFT  COPY -

 09/21/2014 23:01  Local Title: MED ONCO IRCH DISCHARGE SUMMARY
            Standard Title: DISCHARGE SUMMARY
                 AUTHOR:  KUMAR,UVW

LOCAL TITLE: MED ONCO IRCH DISCHARGE SUMMARY 
STANDARD TITLE: DISCHARGE SUMMARY 

NAME , SINGH        000-01-0202                          DOB: 01/01/1972




DATE OF NOTE: SEP 21, 2014@22:04     ENTRY DATE: SEP 21, 2014@22:04:42 
   AUTHOR: UVW KUMAR 

REGISTRATION DETAILS
********************
 UHID No:000-01-0202    IRCH No:000222    CR No:111000 
 NAME: NAME        AGE:22 YEAR    GENDER:MALE
 DOA:Sep 2, 2014    DOD:Sep 18, 2014    DURATION OF STAY: days 
 WARD: MRO Ward     BED No:14 
 CONSULTANT INCHARGE:Dr UVW Kumar

 DIAGNOSIS & REASON FOR CURRENT ADMISSION
 ****************************************
 DIAGNOSIS:Acute Promyelocytic leukemia (Intermediate Risk)

 ADMITTED FOR :Chemotherapy
 CASE SUMMARY:NAME Singh presented with complaints of bleeding gums, fever, 

 NAME , SINGH       000-01-0202                          DOB: 01/01/1972




blurring of vision and gum hypertrophy. He diagnosed as APML in PQW 
hospital based on PS, BMA and PML/RARa positive. He started on ATRA and after 
that reffered here. His basline hemorem at PQW Hospital was s/o Hb : 
4.6, TLC: 1580/cu.mm, Platlet: 6000/cu.mm. So he is classified as
intermideate risk APML. After coming here diagnosis reconfirmed, 
daunorubicin    given   60mg/m2 and continoued on ATRA. No features of 
ATRA syndrome noticed during ward stay. His fibrinogen level were > 450 
mg/dl. He remained afebrile and hemodynamically stable and dischared on
stable condition.

PRESENTATION AT CURRENT ADMISSION
*********************************
 VITAL SIGNS:
 TEMP:99 F   RESP:19/min   PULSE:98/min 
 BP:121/78 mm of Hg   SPO2:99% on RA



NAME , SINGH        000-01-0202                          DOB: 01/01/1972




 GENERAL PHYSICAL EXAMINATION: PERFORMANCE STATUS: I
 PALLOR:+   ICTERUS:-   OEDEMA:-   CYANOSIS:-
 STERNAL TENDERNESS:-   CLUBBING:-  GUM HYPERTROPHY:+ 
 LYMPHNODES: -

BIOMETRIC DETAILS: WEIGHT: 45 kg  HEIGHT:166 cms   BSA: 1.4 m2

INVESTIGATIONS AT CURRENT ADMISSSION
************************************
PS (3/9/2014) : N2, L8, E-, M1, B-, Meta-, Myelo-, Blast 89%. Blast and abnormal

 promyelocytes present. F/S/O Acute promyelocytic leukemia.

 BMA (3/9/2014): Cellular BM shows 90% blast and abnormal promyelocyte. F/S/O 
 APML.

 Flow Cytometery (3/9/2014): 87% abnormal promyelocyte, Positive : CD45, CD15, 

NAME , SINGH        000-01-0202                          DOB: 01/01/1972




CD11b, CD13, CD33, CD64, CD9, CD18, cMPO.
Negative for CD2, CD14, CD117, CD19, HLADR, CCD79a, cCD3.

 Day 12 PS (9/9/2014): N78, L20, E-, M2, B-, Meta-, Myelo_ Promyelo Nil, Blast 
Nil. 


 Condition at discharge: 
 VITAL SIGNS:
 TEMP:99 F   RESP:18/min   PULSE:78/min 
 BP:112/74 mm of Hg   SPO2:99% on RA


 Plan At discharge and follow up: As written in OPD card




NAME , SINGH        000-01-0202                          DOB: 01/01/1972






                   <  THE ABOVE NOTE IS UNSIGNED  >                      
 - DRAFT COPY * DRAFT COPY * DRAFT COPY * DRAFT COPY * DRAFT COPY * DRAFT COPY -

 09/21/2014 22:04  Local Title: MED ONCO IRCH DISCHARGE SUMMARY
            Standard Title: DISCHARGE SUMMARY
                 AUTHOR:  UVW,AMIT

 REGISTRATION DETAILS
 ********************
 UHID No:000-01-0202    IRCH No:000222    CR No:111000 
 NAME: NAME , SINGH         AGE:42    GENDER:FEMALE
 DOA:Sep 2, 2014    DOD:Sep 18, 2014    DURATION OF STAY: days 
 WARD: MRO Ward     BED No:14 
 CONSULTANT INCHARGE:Dr Lalit Kumar
 ADDRESS:             , 

 NAME , SINGH       000-01-0202                          DOB: 01/01/1972




 DIAGNOSIS & REASON FOR CURRENT ADMISSION
 ****************************************
 DIAGNOSIS: 
 Acute Promyelocytic leukemia (Intermediate Risk)

 ADMITTED FOR :Chemotherapy
 CASE SUMMARY:NAME Singh presented with complaints of bleeding gums,  
 fever, blurring of vision and gum hypertrophy. He diagnosed as APML in 
 UVW hospital based on PS and PML/RARa positive. He started on ATRA and 
 after that reffered to XYZ hospital

 PRESENTATION AT CURRENT ADMISSION
 *********************************
 VITAL SIGNS:
 TEMP:F   RESP:/min   PULSE:/min 
 BP:/mm of Hg   SPO2:%


 NAME , SINGH       000-01-0202                          DOB: 01/01/1972





 GENERAL PHYSICAL EXAMINATION: PERFORMANCE STATUS: 
 PALLOR:   ICTERUS:    OEDEMA:   CYANOSIS:
 STERNAL TENDERNESS:   CLUBBING:  GUM HYPERTROPHY: 
 LYMPHNODES: 


 SPECIFIC FINDINGS:

 BIOMETRIC DETAILS: WEIGHT:kgS  HEIGHT:cms   BSA: m2 
 INVESTIGATIONS AT CURRENT ADMISSSION
************************************ 


                   <  THE ABOVE NOTE IS UNSIGNED  >                      
 - DRAFT COPY * DRAFT COPY * DRAFT COPY * DRAFT COPY * DRAFT COPY * DRAFT COPY -


  NAME , SINGH          000-01-0202                          DOB: 01/01/1972

这是我需要转换为CSV的文字内容。这是一名多次来医院的患者的详细信息。我想提取不同栏目中的医疗数据[年龄,性别,UHID,DOA,部门,诊断,治疗,患者状态,生命体征,地方名称,标准名称,病例摘要,入院,一般体检]。

正如您可以看到“诊断”的重复,并且列名称可能也会有所不同。

要处理的文件是15GB。

请建议解决问题的方法。我尝试使用python,openrefine和ctakes工具。

请介绍一下如何解决这类问题。限制是我们必须只使用开源免费工具。

1 个答案:

答案 0 :(得分:1)

你可以用gawk做一些事情。像生命线和治疗这样的多线字段可能会变得难以制作成CSV格式,但这是单值字段的开始。

function dump() {
    print age "," sex "," uhid "," doa "," dept "," diagnosis
}

BEGIN { onfirst = 1 }
END { dump() }

{
    sub(/^ */, "")
    sub(/UHID No/, "UHID")
}


match($0, /UHID:([^ ]*)/, a) {
    if(onfirst)
        onfirst = 0
    else
        dump()
    uhid = a[1]
}

match($0, /AGE\/SEX:([0-9]*)\/(.*[^ ]) *$/, a) {
    age = a[1]
    sex = a[2]
}

match($0, /DOA:([^ ][^ ]*  *[^ ][^ ]*  *[^ ][^ ]*)/, a) {
    doa = a[1]
}

match($0, /DEPARTMENT:(.*[^ ]) *UNIT/, a) {
    dept = a[1]
}

match($0, /DIAGNOSIS:(.*[^ ]) *$/, a) {
    diagnosis = a[1]
}
相关问题