顺序与并行解决方案

时间:2018-01-27 16:38:43

标签: oracle plsql parallel-processing

我会尝试尽可能简化问题。

假设我们在Oracle 11g中有3个表。

Persons (person_id, name, surname, status, etc )
Actions (action_id, person_id, action_value, action_date, calculated_flag) 
Calculations (calculation_id, person_id,computed_value,computed_date)

我想要的是符合特定条件的每个人(比方说status=3) 我应该从sumaction_values获取Actions calculated_flag=0 select sum(action_value) from Actions where calculated_flag=0 and person_id=current_id。 (像这样sum)。

然后我将在某种公式中使用Calculations并更新该特定person_id的update Calculations set computed_value=newvalue, computed_date=sysdate where person_id=current_id 表。

calculated_flag

之后,参与行的1将设置为update Actions set calculated_flag=1 where calculated_flag=0 and person_id=current_id

DBMS_PARALLEL_EXECUTE

现在可以通过创建将通过Persons表运行的游标然后执行特定人员所需的每个操作来顺序完成。

(我没有提供顺序解决方案的代码,因为上面只是一个类似于我的实际设置的例子。)

问题在于我们谈论的是大量数据,而顺序方法似乎浪费了计算时间。

在我看来,这个任务可以并行执行多个person_id。

所以问题是:

可以在PL / SQL中使用并行化执行此类任务吗?

解决方案是什么样的?也就是说,应该使用哪些特殊包(例如bulk collect),关键字(例如procedure sequential_solution is cursor persons_of_interest is select person_id from persons where status = 3; tempvalue number; newvalue number; begin for person in persons_of_interest loop begin savepoint personsp; --step 1 select sum(action_value) into tempvalue from actions where calculated_flag = 0 and person_id = person.person_id; newvalue := dosomemorecalculations(tempvalue); --step 2 update calculations set computed_value = newvalue, computed_date = sysdate where person_id = person.person_id; --step 3 update actions set calculated_flag = 1; where calculated_flag = 0 and person_id = person.person_id; --step 4 (didn't mention this step before - sorry) insert into actions ( person_id, action_value, action_date, calculated_flag ) values ( person.person_id, 100, sysdate, 0 ); exception when others then rollback to personsp; -- this call is defined with pragma AUTONOMOUS_TRANSACTION: log_failure(person_id); end; end loop; end; )方法以及以何种方式使用?

另外,我是否应该对并行更新的部分失败感到担忧?

请注意,我不太熟悉PL / SQL的并行编程。 感谢。

修改1。 这是我的顺序解决方案的伪代码

forall

现在,我如何使用bulk colletctvar beasts = 'ant 222, bison, ant 333, ant 555, goose 234'; var beastsArray = beasts.split(','); console.log(beastsArray.length);或使用并行编程在以下约束条件下加快上述速度:

  1. 适当的内存管理(考虑到大量数据)
  2. 对于单个人,如果步骤序列的一部分失败 - 应回滚所有步骤并记录失败。

1 个答案:

答案 0 :(得分:2)

我可以提出以下建议。假设您在persons表中有1 000 000行,并且您希望每次迭代处理10 000个人。所以你可以这样做:

declare
  id_from persons.person_id%type;
  id_to persons.person_id%type;
  calc_date date := sysdate;
begin
    for i in 1 .. 100 loop
      id_from := (i - 1) * 10000;
      id_to := i * 10000;

      -- Updating Calculations table, errors are logged into err$_calculations table
      merge into Calculations c
      using (select p.person_id, sum(action_value) newvalue
               from Actions a join persons p on p.person_id = a.person_id
              where a.calculated_flag = 0 
                and p.status = 3
                and p.person_id between id_from and id_to
              group by p.person_id) s
         on (s.person_id = c.person_id)
      when matched then update
       set c.computed_value = s.newvalue, 
           c.computed_date = calc_date
       log errors into err$_calculations reject limit unlimited;

      -- updating actions table only for those person_id which had no errors:
      merge into actions a
      using (select distinct p.person_id
               from persons p join Calculations c on p.person_id = c.person_id
              where c.computed_date = calc_date
                and p.person_id between id_from and id_to)
         on (c.person_id = p.person_id)
       when matched then update
       set a.calculated_flag = 1;

      -- inserting list of persons for who calculations were successful
      insert into actions (person_id, action_value, action_date, calculated_flag)
       select distinct p.person_id, 100, calc_date, 0
         from persons p join Calculations c on p.person_id = c.person_id
        where c.computed_date = calc_date
          and p.person_id between id_from and id_to;

      commit;
    end loop;
end;

工作原理:

  • 您将persons表格中的数据拆分为大约10000行的块(取决于ID的数量差距,i * 10000的最大值应明知多于最大person_id })
  • 您在MERGE语句中进行计算并更新Calculations
  • LOG ERRORS子句可防止异常。如果发生错误,则不会更新包含错误的行,但会将其插入表中以进行错误记录。执行不会中断。要创建此表,请执行:

    begin
      DBMS_ERRLOG.CREATE_ERROR_LOG('CALCULATIONS');
    end;
    

    将创建表err$_calculations。有关DBMS_ERRLOG包的详细信息,请参阅documentation

  • 第二个MERGE语句仅为没有错误的行设置calculated_flag = 1INSERT语句将这些行插入actions表。可以使用select表中的Calculations找到这些行。
  • 此外,我添加了变量id_fromid_to来计算要更新的ID范围,并添加变量calc_date以确保所有行都在第一个{{1}中更新可以在日期之后找到声明。