使用regexp从逗号分隔列表中删除重复项

时间:2017-07-31 16:52:50

标签: oracle plsql regexp-replace

我有

contract, clause 1, Subsection 1.1, contract, clause 1, Subsection 1.2,
paragraph (a), contract, clause 1, Subsection 1.2, paragraph (b), contract, 
clause 2 

我希望得到

contract, clause 1, Subsection 1.1, Subsection 1.2, paragraph (a), paragraph 
(b), clause 2

我发现regexp可以做到这一点,但我找不到用来做它的字符串

请帮助..

1 个答案:

答案 0 :(得分:1)

基于this link将逗号分隔值拆分成行,我将字符串拆分为行,保留第一个出现的位置,使得一个明确的重新聚合值

with test_string as ( 
select 1 as id,
 'contract, clause 1, Subsection 1.1, contract, clause 1, Subsection 1.2, paragraph (a), contract, clause 1, Subsection 1.2, paragraph (b), contract, clause 2' val 
from dual)
select id, listagg(word,', ') WITHIN GROUP (order by position) FROM (
select distinct id, first_value(position) over ( partition by word order by position ) position, word from (
select 
  distinct t.id,
  levels.column_value as position,
  trim(regexp_substr(t.val, '[^,]+', 1, levels.column_value))  as word
from 
  test_string t,
  table(cast(multiset(select level from dual connect by  level <= length (regexp_replace(t.val, '[^,]+'))  + 1) as sys.OdciNumberList)) levels
  )
) GROUP BY id

如果您对保持订单不感兴趣

with test_string as ( 
select 1 as id,
 'contract, clause 1, Subsection 1.1, contract, clause 1, Subsection 1.2, paragraph (a), contract, clause 1, Subsection 1.2, paragraph (b), contract, clause 2' val 
from dual)
select id, listagg(word,', ') WITHIN GROUP (order by 1) FROM (
select 
  distinct t.id,
  trim(regexp_substr(t.val, '[^,]+', 1, levels.column_value))  as word
from 
  test_string t,
  table(cast(multiset(select level from dual connect by  level <= length (regexp_replace(t.val, '[^,]+'))  + 1) as sys.OdciNumberList)) levels
) GROUP BY id
相关问题