从字符串中获取顺序后从序列中找到缺失的数字?

时间:2014-02-18 08:07:21

标签: oracle

我有数以百万计的字符串记录,就像这一个有310种类型,它们具有不同的格式来获取序列,年,月和日...

脚本将获得序列,年,月和日...现在我想要一个Pl / Sql,它将获得序列的最大值和最小值,并找到缺少的数字,例如年份和月份14 - 06怎么??

1 个答案:

答案 0 :(得分:1)

你根本不想看dual;当然不会试图插入。您需要跟踪迭代循环时看到的最高值和最低值。根据{{​​1}}代表日期的某些元素,我非常确定您希望所有匹配都是ename,而不是0-9。您还在访问其字段时引用游标名称,而不是记录变量名称:

1-9

使用 FOR List_ENAME_rec IN List_ENAME_cur loop if REGEXP_LIKE(List_ENAME_rec.ENAME,'emp[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][G][1]') then V_seq := substr(List_ENAME_rec.ename,5,4); V_Year := substr(List_ENAME_rec.ename,10,2); V_Month := substr(List_ENAME_rec.ename,13,2); V_day := substr(List_ENAME_rec.ename,16,2); if min_seq is null or V_seq < min_seq then min_seq := v_seq; end if; if max_seq is null or V_seq > max_seq then max_seq := v_seq; end if; end if; end loop; emp-1111_14_01_01_1111_G1表中的值,报告emp-1115_14_02_02_1111_G1

如果你真的想涉及双重,你可以在循环内部而不是if / then / assign模式,但没有必要:

max_seq 1115 min_seq 1111

我不知道该程序将要做什么; select least(min_seq, v_seq), greatest(max_seq, v_seq) into min_seq, max_seq from dual; 中的任何内容与您找到的值之间似乎没有任何关系。

虽然你不需要任何PL / SQL。您可以从简单的查询中获取最小/最大值:

test1

您可以使用它们生成该范围内所有值的列表:

select min(to_number(substr(ename, 5, 4))) as min_seq,
  max(to_number(substr(ename, 5, 4))) as max_seq
from table1
where status = 2
and regexp_like(ename,
  'emp[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][G][1]')

   MIN_SEQ    MAX_SEQ
---------- ----------
      1111       1115 

一个稍微不同的公用表表达式,看看你的表中哪些不存在,我认为这就是你所追求的:

with t as (
  select min(to_number(substr(ename, 5, 4))) as min_seq,
    max(to_number(substr(ename, 5, 4))) as max_seq
  from table1
  where status = 2
  and regexp_like(ename,
    'emp[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][G][1]')
)
select min_seq + level - 1 as seq
from t
connect by level <= (max_seq - min_seq) + 1;

       SEQ
----------
      1111 
      1112 
      1113 
      1114 
      1115 

或者如果您愿意:

with t as (
  select to_number(substr(ename, 5, 4)) as seq
  from table1
  where status = 2
  and regexp_like(ename,
    'emp[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][G][1]')
),
u as (
  select min(seq) as min_seq,
    max(seq) as max_seq
  from t
),
v as (
  select min_seq + level - 1 as seq
  from u
  connect by level <= (max_seq - min_seq) + 1
)
select v.seq as missing_seq
from v
left join t on t.seq = v.seq
where t.seq is null
order by v.seq;

MISSING_SEQ
-----------
       1112 
       1113 
       1114 

SQL Fiddle


基于注释,我认为您希望ID的其他元素(YY_MM_DD)的每个组合的序列缺失值。这将为您提供细分:

...
select v.seq as missing_seq
from v
where not exists (select 1 from t where t.seq = v.seq)
order by v.seq;

输出如:

with t as (
  select to_number(substr(ename, 5, 4)) as seq,
    substr(ename, 10, 2) as yy,
    substr(ename, 13, 2) as mm,
    substr(ename, 16, 2) as dd
  from table1
  where status = 2
  and regexp_like(ename,
    'emp[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][G][1]')
),
r (yy, mm, dd, seq, max_seq) as (
  select yy, mm, dd, min(seq), max(seq)
  from t
  group by yy, mm, dd
  union all
  select yy, mm, dd, seq + 1, max_seq
  from r
  where seq + 1 <= max_seq
)
select yy, mm, dd, seq as missing_seq
from r
where not exists (
  select 1 from t
  where t.yy = r.yy
  and t.mm = r.mm
  and t.dd = r.dd
  and t.seq = r.seq
)
order by yy, mm, dd, seq;

SQL Fiddle

如果您要查找特定日期,请对其进行冷过滤(在YY MM DD MISSING_SEQ ---- ---- ---- ------------- 14 01 01 1112 14 01 01 1113 14 01 01 1114 14 02 02 1118 14 02 02 1120 14 02 03 1127 14 02 03 1128 中,或t中的第一个分支),但您也可以更改正则表达式模式以包含固定值;因此,要查找r模式,例如14 06。虽然这很难概括,但过滤器('emp[-][0-9]{4}_14_06_[0-9]{2}[_][0-9]{4}[_][G][1]'可能更灵活。


如果你坚持在程序中使用它,你可以使日期元素可选并修改正则表达式模式:

where t.yy = '14' and t.mm = '06'

我不知道为什么你坚持要这样做或者为什么要使用create or replace procedure show_missing_seqs(yy in varchar2 default '[0-9]{2}', mm in varchar2 default '[0-9]{2}', dd in varchar2 default '[0-9]{2}') as pattern varchar2(80); cursor cur (pattern varchar2) is with t as ( select to_number(substr(ename, 5, 4)) as seq, substr(ename, 10, 2) as yy, substr(ename, 13, 2) as mm, substr(ename, 16, 2) as dd from table1 where status = 2 and regexp_like(ename, pattern) ), r (yy, mm, dd, seq, max_seq) as ( select yy, mm, dd, min(seq), max(seq) from t group by yy, mm, dd union all select yy, mm, dd, seq + 1, max_seq from r where seq + 1 <= max_seq ) select yy, mm, dd, seq as missing_seq from r where not exists ( select 1 from t where t.yy = r.yy and t.mm = r.mm and t.dd = r.dd and t.seq = r.seq ) order by yy, mm, dd, seq; begin pattern := 'emp[-][0-9]{4}[_]' || yy || '[_]' || mm || '[_]' || dd || '[_][0-9]{4}[_][G][1]'; for rec in cur(pattern) loop dbms_output.put_line(to_char(rec.missing_seq, 'FM0000')); end loop; end show_missing_seqs; / ,因为你依赖于显示那个的客户端/来电者;你的工作会对输出做些什么?您可以将此返回dbms_output,这将更灵活。但无论如何,您可以从SQL * Plus / SQL Developer中调用它:

sys_refcursor