Oracle中的正则表达式

时间:2017-04-06 18:39:31

标签: sql regex database oracle regexp-substr

我正在尝试从包含药品目录的Oracle数据库中的字符串中提取度量单位。我一直在使用regexp_substr从字符串中获取药物的浓度

Name col in schema:
CYCLOSPORINE 100MG 30 CAPS
TERBUTALINE 2.5MG 100 TABS 

查询输出:

col 1: CYCLOSPORINE 100MG 30 CAPS, Col 2: 100MG
col 1: TERBUTALINE 2.5MG 100 TABS, Col 2: 2.5MG     



select name, 
regexp_substr(upper(name), 
'(\d*\.*\d+\s*ML|\d*\.*\d+\s*MG|\d*\.*\d+\s*OZ|\d*\.*\d+\s*LB)') 
CONCENTRATION 
from schema.table t 
where t.discontinuedflag=0
and t.restrictioncode <> 0
and t.distributor_id =19

任何人都知道如何使用Oracle中的regexp_substr()从下面的字符串中提取200MG/mL

'TESTOSTERONE CYP 200MG/mL 10ML VIAL C3' 

2 个答案:

答案 0 :(得分:0)

看起来你想要第一个&#34;令牌&#34;在以数字开头的字符串中。如果是这样的话:

select regexp_substr(name || ' ', ' [0-9.]+[^ ]+ ') as concentration

这会将空格连接到name的末尾,因此模式可以在空格中结束,即使它位于name的末尾。

答案 1 :(得分:0)

到目前为止,这些具体示例似乎正在起作用,但正如我在上面的评论中所说,你需要确定数据,所以如果没有更广泛的测试,就不要太信任这些。我有一个NDC表,并进行了一些检查,似乎浓度是描述中的第一个,但我没有检查每个代码,所以测试非常仔细!

正则表达式将parens放在要记住的组周围,从左到右读取,并返回第一个和第二个记住的组。它可以理解为:从行的开头开始,查找一个或多个不是数字的字符,后跟一个或多个数字,然后是可选的小数点和零个或多个数字,后跟零个或多个空格,然后一个可选的度量(管道是逻辑OR),然后是可选的“/ ML”,然后是字符串的其余部分。

SQL> with tbl(drug_name) as (
     select 'CYCLOSPORINE 100MG 30 CAPS' from dual union
     select 'TERBUTALINE 2.5MG 100 TABS' from dual union
     select 'TESTOSTERONE CYP 200MG/mL 10ML VIAL C3' from dual union
     select 'GEMCITABINE 1 GM-26.3 ML VL' from dual union
     select 'NOREPINEPHRINE 1MG/mL 4mL 10 AMP' from dual union
     select 'AMOXI-DROP (50MG)' from dual union
     select 'DARVOCET-N 100 TABLET' from dual union
     select 'ALBON ORAL SUSP 5% 16OZ' from dual
   )
   select drug_name,
   regexp_replace(upper(drug_name), '^\D+(\d+\.?\d*) *((GM|ML|MG|OZ|LB|%)?(/ML)?).*$', '\1\2') CONCENTRATION
   from tbl;

DRUG_NAME                              CONCENTRATION
-------------------------------------- ------------------------------
ALBON ORAL SUSP 5% 16OZ                5%
AMOXI-DROP (50MG)                      50MG
CYCLOSPORINE 100MG 30 CAPS             100MG
DARVOCET-N 100 TABLET                  100
GEMCITABINE 1 GM-26.3 ML VL            1GM
NOREPINEPHRINE 1MG/mL 4mL 10 AMP       1MG/ML
TERBUTALINE 2.5MG 100 TABS             2.5MG
TESTOSTERONE CYP 200MG/mL 10ML VIAL C3 200MG/ML

8 rows selected.

SQL>

Notes:- If the regex does not find a match, the DRUG_NAME column will be returned.
      - Since you upshift the drugname, the original 'mL' spelling becomes 'ML'.  
        Technically it's the same thing but you are altering data which may matter to the 
        consumers of the data.
      - Some drug names like the DARVOCET example don't seem to have a measure in the 
        description.  You need to decide if that's ok.
      - The space between the number and measure is removed.

哦,我使用REGEXP_REPLACE,因为它允许引用多个已保存的组,其中“\ 1”是REGEXP_SUBSTR不允许的简写(仅1个子组)。

相关问题