OLAP - 计算径流三角形,样本数据和包含的多维数据集(PostgreSQL / Mondrian)

时间:2012-11-13 10:01:07

标签: ssas mdx data-warehouse olap mondrian

现实描述: 我们有一个项目清单。在每个项目中都有很多账户。您可以对每个帐户执行大量操作。我确实有以下维度和事实表定义(简化):

Dimensions and attributes:
 Project
  project_key
  project_name
  industry
  number_of_accounts
 Distance
  distance_key
  distance_in_months
  distance_in_quarters
 Account
  account_key
  project_key
  account_id
Fact Table and attributes:
 Action_Fact_Table
  project_key
  distance_key
  account_key
  action_id

现在,我想使用径流三角形方法来分析数据(它可能不是真正的径流三角形,但方法是相同的)。最简单的三角形看起来像:

              Distance in Months
Project name|     1    2    3    4    5    6    7    8    9    10
-------------------------------------------------------------------------
 Project1   |     5   10   15   20   25   30   35   40   45    50
 Project2   |     7   14   21   28   35   42   49   56   63
 Project3   |     2    5    8   11   14   20   25   30
 Project4   |     0    2    5   10   18   23   40
 Project5   |     5   12   18   20   21   30

按行显示多个操作数。以月为单位的距离显示了行动日期和项目开始日期之间的距离。显然,您可以使用四分之一距离(或距离维度中定义的任何其他周期)创建相似的三角形。

您还可以在项目维度中为不同级别的层次结构创建triangel,例如行业(Project1-Project3 = Industry1,Project4-Project5 = Industry2):

              Distance in Months
Project name|     1    2    3    4    5    6    7    8    9    10
-------------------------------------------------------------------------
 Industry1   |    14   29   44   59   74   92  109  126  108   50
 Industry2   |     5   14   23   30   39   53   40   

还有更高级的径流三角形,您可以按行数划分行动的总和。假设我们的项目有以下数量的帐户:

Project_name number_of_accounts  
-----------------------------
Project1     100
Project2     100
Project3     100
Project4     100
Project5     200

然后我想得到以下三角形:

              Distance in Months
Project    |     1    2    3    4    5    6    7    8    9    10
------------------------------------------------------------------------
 Project1  |   .05  .01  .15  .20  .25  .30  .35  .40  .45   .50
 Project2  |    .7  .14  .21  .28  .35  .42  .49  .56  .63
 Project3  |    .2   .5   .8  .11  .14  .20  .25  .30
 Project4  |    .0   .2   .5  .10  .18  .23  .40
 Project5  |   .05  .06  .09  .10 .105  .15

如果您希望在项目中的帐户数量与所有项目不同的情况下比较项目及其操作,这将非常有用。

问题是是否可以在OLAP中创建此类计算。我以为我可以在项目表中使用number_of_accounts,但我无法弄明白。另一种选择是汇总帐户维度中的数据。我也无法使用谷歌找到任何东西,也许是因为我提出了错误的问题。

这个问题的解决方案广泛适用于许多行业,尤其​​是在保险和银行业务方面。它可以在流程具有长性能窗口的任何地方使用,并且可以通过定义明确,可比较的批次单元进行跟踪。

(我们正在使用PostgreSQL,Saiku,多维数据集在Schema Workbench中定义)

测试数据(PostgreSQL语法,如果您需要其他内容,请告诉我)

--drop table if exists project cascade;
create table project (
  project_key int primary key,
  project_name character varying,
  industry character varying,
  number_of_accounts int
);

--drop table if exists distance cascade;
create table distance (
  distance_key int primary key,
  distance_in_months int,
  distance_in_quarters int);

--drop table if exists account cascade;
create table account (
  account_key int primary key,
  project_key int references project (project_key)
);

--drop table if exists action_fact_table cascade;
create table action_fact_table (
  project_key int references project (project_key),
  distance_key int references distance (distance_key),
  account_key int references account (account_key),
  action_id int
);

-- project data
insert into project values (1,'Project1','Industry1',100);
insert into project values (2,'Project2','Industry1',100);
insert into project values (3,'Project3','Industry1',100);
insert into project values (4,'Project4','Industry2',100);
insert into project values (5,'Project5','Industry2',200);

-- distance data
insert into distance values(1,1,1);
insert into distance values(2,2,1);
insert into distance values(3,3,1);
insert into distance values(4,4,2);
insert into distance values(5,5,2);
insert into distance values(6,6,2);
insert into distance values(7,7,3);
insert into distance values(8,8,3);
insert into distance values(9,9,3);
insert into distance values(10,10,4);
insert into distance values(11,11,4);
insert into distance values(12,12,4);

-- account data
/* let me know if you need insert statement for every row */
insert into account (
select generate_series (1,100), 1 union all
select generate_series (101,200), 2 union all
select generate_series (201,300), 3 union all
select generate_series (301,400), 4 union all
select generate_series (401,600), 5
);

insert into action_fact_table values(1,1,90,10001);
insert into action_fact_table values(1,1,32,10002);
insert into action_fact_table values(1,1,41,10003);
insert into action_fact_table values(1,1,54,10004);
insert into action_fact_table values(1,1,45,10005);
insert into action_fact_table values(1,2,22,10006);
insert into action_fact_table values(1,2,29,10007);
insert into action_fact_table values(1,2,41,10008);
insert into action_fact_table values(1,2,89,10009);
insert into action_fact_table values(1,2,15,10010);
insert into action_fact_table values(1,3,32,10011);
insert into action_fact_table values(1,3,100,10012);
insert into action_fact_table values(1,3,72,10013);
insert into action_fact_table values(1,3,80,10014);
insert into action_fact_table values(1,3,10,10015);
insert into action_fact_table values(1,4,12,10016);
insert into action_fact_table values(1,4,45,10017);
insert into action_fact_table values(1,4,83,10018);
insert into action_fact_table values(1,4,42,10019);
insert into action_fact_table values(1,4,33,10020);
insert into action_fact_table values(1,5,22,10021);
insert into action_fact_table values(1,5,27,10022);
insert into action_fact_table values(1,5,59,10023);
insert into action_fact_table values(1,5,32,10024);
insert into action_fact_table values(1,5,70,10025);
insert into action_fact_table values(1,6,32,10026);
insert into action_fact_table values(1,6,5,10027);
insert into action_fact_table values(1,6,15,10028);
insert into action_fact_table values(1,6,70,10029);
insert into action_fact_table values(1,6,43,10030);
insert into action_fact_table values(1,7,59,10031);
insert into action_fact_table values(1,7,9,10032);
insert into action_fact_table values(1,7,99,10033);
insert into action_fact_table values(1,7,79,10034);
insert into action_fact_table values(1,7,31,10035);
insert into action_fact_table values(1,8,56,10036);
insert into action_fact_table values(1,8,34,10037);
insert into action_fact_table values(1,8,48,10038);
insert into action_fact_table values(1,8,79,10039);
insert into action_fact_table values(1,8,42,10040);
insert into action_fact_table values(1,9,10,10041);
insert into action_fact_table values(1,9,10,10042);
insert into action_fact_table values(1,9,49,10043);
insert into action_fact_table values(1,9,61,10044);
insert into action_fact_table values(1,9,49,10045);
insert into action_fact_table values(1,10,99,10046);
insert into action_fact_table values(1,10,69,10047);
insert into action_fact_table values(1,10,84,10048);
insert into action_fact_table values(1,10,99,10049);
insert into action_fact_table values(1,10,3,10050);
insert into action_fact_table values(2,1,182,10051);
insert into action_fact_table values(2,1,127,10052);
insert into action_fact_table values(2,1,197,10053);
insert into action_fact_table values(2,1,174,10054);
insert into action_fact_table values(2,1,187,10055);
insert into action_fact_table values(2,1,144,10056);
insert into action_fact_table values(2,1,160,10057);
insert into action_fact_table values(2,2,155,10058);
insert into action_fact_table values(2,2,153,10059);
insert into action_fact_table values(2,2,119,10060);
insert into action_fact_table values(2,2,188,10061);
insert into action_fact_table values(2,2,125,10062);
insert into action_fact_table values(2,2,147,10063);
insert into action_fact_table values(2,2,123,10064);
insert into action_fact_table values(2,3,136,10065);
insert into action_fact_table values(2,3,163,10066);
insert into action_fact_table values(2,3,187,10067);
insert into action_fact_table values(2,3,138,10068);
insert into action_fact_table values(2,3,168,10069);
insert into action_fact_table values(2,3,132,10070);
insert into action_fact_table values(2,3,138,10071);
insert into action_fact_table values(2,4,158,10072);
insert into action_fact_table values(2,4,171,10073);
insert into action_fact_table values(2,4,153,10074);
insert into action_fact_table values(2,4,141,10075);
insert into action_fact_table values(2,4,182,10076);
insert into action_fact_table values(2,4,165,10077);
insert into action_fact_table values(2,4,143,10078);
insert into action_fact_table values(2,5,190,10079);
insert into action_fact_table values(2,5,181,10080);
insert into action_fact_table values(2,5,163,10081);
insert into action_fact_table values(2,5,134,10082);
insert into action_fact_table values(2,5,145,10083);
insert into action_fact_table values(2,5,190,10084);
insert into action_fact_table values(2,5,198,10085);
insert into action_fact_table values(2,6,137,10086);
insert into action_fact_table values(2,6,133,10087);
insert into action_fact_table values(2,6,135,10088);
insert into action_fact_table values(2,6,103,10089);
insert into action_fact_table values(2,6,187,10090);
insert into action_fact_table values(2,6,127,10091);
insert into action_fact_table values(2,6,117,10092);
insert into action_fact_table values(2,7,116,10093);
insert into action_fact_table values(2,7,139,10094);
insert into action_fact_table values(2,7,111,10095);
insert into action_fact_table values(2,7,150,10096);
insert into action_fact_table values(2,7,151,10097);
insert into action_fact_table values(2,7,181,10098);
insert into action_fact_table values(2,7,109,10099);
insert into action_fact_table values(2,8,102,10100);
insert into action_fact_table values(2,8,101,10101);
insert into action_fact_table values(2,8,118,10102);
insert into action_fact_table values(2,8,147,10103);
insert into action_fact_table values(2,8,186,10104);
insert into action_fact_table values(2,8,136,10105);
insert into action_fact_table values(2,8,160,10106);
insert into action_fact_table values(2,9,149,10107);
insert into action_fact_table values(2,9,119,10108);
insert into action_fact_table values(2,9,169,10109);
insert into action_fact_table values(2,9,176,10110);
insert into action_fact_table values(2,9,195,10111);
insert into action_fact_table values(2,9,183,10112);
insert into action_fact_table values(2,9,140,10113);
insert into action_fact_table values(3,1,224,10114);
insert into action_fact_table values(3,1,241,10115);
insert into action_fact_table values(3,2,295,10116);
insert into action_fact_table values(3,2,249,10117);
insert into action_fact_table values(3,2,260,10118);
insert into action_fact_table values(3,3,298,10119);
insert into action_fact_table values(3,3,267,10120);
insert into action_fact_table values(3,3,297,10121);
insert into action_fact_table values(3,4,211,10122);
insert into action_fact_table values(3,4,253,10123);
insert into action_fact_table values(3,4,214,10124);
insert into action_fact_table values(3,5,248,10125);
insert into action_fact_table values(3,5,223,10126);
insert into action_fact_table values(3,5,288,10127);
insert into action_fact_table values(3,6,207,10128);
insert into action_fact_table values(3,6,296,10129);
insert into action_fact_table values(3,6,221,10130);
insert into action_fact_table values(3,6,201,10131);
insert into action_fact_table values(3,6,227,10132);
insert into action_fact_table values(3,6,209,10133);
insert into action_fact_table values(3,7,267,10134);
insert into action_fact_table values(3,7,282,10135);
insert into action_fact_table values(3,7,215,10136);
insert into action_fact_table values(3,7,285,10137);
insert into action_fact_table values(3,7,212,10138);
insert into action_fact_table values(3,8,239,10139);
insert into action_fact_table values(3,8,294,10140);
insert into action_fact_table values(3,8,296,10141);
insert into action_fact_table values(3,8,251,10142);
insert into action_fact_table values(3,8,281,10143);
insert into action_fact_table values(4,2,392,10144);
insert into action_fact_table values(4,2,347,10145);
insert into action_fact_table values(4,3,318,10146);
insert into action_fact_table values(4,3,400,10147);
insert into action_fact_table values(4,3,378,10148);
insert into action_fact_table values(4,4,315,10149);
insert into action_fact_table values(4,4,318,10150);
insert into action_fact_table values(4,4,394,10151);
insert into action_fact_table values(4,4,382,10152);
insert into action_fact_table values(4,4,317,10153);
insert into action_fact_table values(4,5,314,10154);
insert into action_fact_table values(4,5,354,10155);
insert into action_fact_table values(4,5,338,10156);
insert into action_fact_table values(4,5,375,10157);
insert into action_fact_table values(4,5,317,10158);
insert into action_fact_table values(4,5,329,10159);
insert into action_fact_table values(4,5,342,10160);
insert into action_fact_table values(4,5,380,10161);
insert into action_fact_table values(4,6,313,10162);
insert into action_fact_table values(4,6,311,10163);
insert into action_fact_table values(4,6,336,10164);
insert into action_fact_table values(4,6,380,10165);
insert into action_fact_table values(4,6,355,10166);
insert into action_fact_table values(4,7,386,10167);
insert into action_fact_table values(4,7,322,10168);
insert into action_fact_table values(4,7,311,10169);
insert into action_fact_table values(4,7,367,10170);
insert into action_fact_table values(4,7,350,10171);
insert into action_fact_table values(4,7,384,10172);
insert into action_fact_table values(4,7,391,10173);
insert into action_fact_table values(4,7,331,10174);
insert into action_fact_table values(4,7,373,10175);
insert into action_fact_table values(4,7,314,10176);
insert into action_fact_table values(4,7,305,10177);
insert into action_fact_table values(4,7,331,10178);
insert into action_fact_table values(4,7,350,10179);
insert into action_fact_table values(4,7,376,10180);
insert into action_fact_table values(4,7,387,10181);
insert into action_fact_table values(4,7,312,10182);
insert into action_fact_table values(4,7,397,10183);
insert into action_fact_table values(5,1,404,10184);
insert into action_fact_table values(5,1,562,10185);
insert into action_fact_table values(5,1,511,10186);
insert into action_fact_table values(5,1,594,10187);
insert into action_fact_table values(5,1,541,10188);
insert into action_fact_table values(5,2,506,10189);
insert into action_fact_table values(5,2,427,10190);
insert into action_fact_table values(5,2,481,10191);
insert into action_fact_table values(5,2,463,10192);
insert into action_fact_table values(5,2,579,10193);
insert into action_fact_table values(5,2,455,10194);
insert into action_fact_table values(5,2,527,10195);
insert into action_fact_table values(5,3,465,10196);
insert into action_fact_table values(5,3,562,10197);
insert into action_fact_table values(5,3,434,10198);
insert into action_fact_table values(5,3,401,10199);
insert into action_fact_table values(5,3,464,10200);
insert into action_fact_table values(5,3,500,10201);
insert into action_fact_table values(5,4,554,10202);
insert into action_fact_table values(5,4,600,10203);
insert into action_fact_table values(5,5,483,10204);
insert into action_fact_table values(5,6,552,10205);
insert into action_fact_table values(5,6,565,10206);
insert into action_fact_table values(5,6,586,10207);
insert into action_fact_table values(5,6,544,10208);
insert into action_fact_table values(5,6,436,10209);
insert into action_fact_table values(5,6,531,10210);
insert into action_fact_table values(5,6,409,10211);
insert into action_fact_table values(5,6,524,10212);
insert into action_fact_table values(5,6,564,10213);

示例立方体(蒙德里安):

<Schema name="RunoffTriangleSchema">
  <Cube name="RunoffTriangleCube" visible="true" cache="true" enabled="true">
    <Table name="action_fact_table" schema="public">
    </Table>
    <Dimension type="StandardDimension" visible="true" foreignKey="project_key" name="Project">
      <Hierarchy name="Project" visible="true" hasAll="true">
        <Table name="project" schema="public" alias="">
        </Table>
        <Level name="Industry" visible="true" column="industry" uniqueMembers="false">
        </Level>
        <Level name="Project Name" visible="true" column="project_name" uniqueMembers="false">
        </Level>
      </Hierarchy>
    </Dimension>
    <Dimension type="StandardDimension" visible="true" foreignKey="distance_key" name="Distance">
      <Hierarchy name="Distance" visible="true" hasAll="true">
        <Table name="distance" schema="public" alias="">
        </Table>
        <Level name="Distance In Quarters" visible="true" column="distance_in_quarters" uniqueMembers="false">
        </Level>
        <Level name="Distance In Months" visible="true" column="distance_in_months" uniqueMembers="false">
        </Level>
      </Hierarchy>
    </Dimension>
    <Dimension type="StandardDimension" visible="true" foreignKey="account_key" name="Account">
      <Hierarchy name="Account" visible="true" hasAll="true">
        <Table name="account" schema="public">
        </Table>
        <Level name="Account Key" visible="true" column="account_key" uniqueMembers="false">
        </Level>
      </Hierarchy>
    </Dimension>
    <Measure name="CountActions" column="action_id" aggregator="count" visible="true">
    </Measure>
  </Cube>
</Schema>

2 个答案:

答案 0 :(得分:2)

两个赏金,没有答案,我很惊讶。我找到了一个解决方案 - 使用SQL和BIRT引擎我现在接近我想要的。我仍然希望有人可以为OLAP解决这个问题。


为了完成这项工作,我有:

  • 返回动态选定列的自定义函数
  • 根据所选列计算径流三角形数据的SQL
  • BIRT 2.6.1中的报告显示结果并提供参数选择界面

动态返回列

    CREATE or replace FUNCTION bizdata.getColumns(_column1 text, _column2 text, _column3 text, _column4 text, _table text, _rqdl text)
      RETURNS TABLE(cmf1 text, cmf2 text, cmf3 text, outval numeric, rqdl text) AS $$
    BEGIN
        RETURN QUERY EXECUTE 
            'SELECT ' 
                || case when _column1 = 'None' then quote_literal('None') else quote_ident(_column1) end || '::text as cmf1,' 
                || case when _column2 = 'None' then quote_literal('None') else quote_ident(_column2) end || '::text as cmf2,' 
                || case when _column3 = 'None' then quote_literal('None') else quote_ident(_column3) end || '::text as cmf3,'   
                || quote_ident(_column4) || '::numeric as baseline,'
                || case when _rqdl = 'None' then 0::text else quote_ident(_rqdl)::text end || '::text as rqdl'  
            ' FROM '
                || 'bizdata.' || _table; 
    END;
     $$ LANGUAGE plpgsql;

Thi function takes the following as input variables:

 - _column1 - common mapping field number 1
 - _column2 - common mapping field number 2
 - _column3 - common mapping field number 3
 - _column4 - column used for aggregation (sum)
 - _table - table used for getting data
 - _rqdl - requested distance level

计算数据

Using bizdata.getColumns() function I can calculate triangle data using the following statement:


with 

params as (
    select 'cmf1'::varchar as prm_name, 'project_owner_name_short'::varchar as prm_value union all 
    select 'cmf2'::varchar as prm_name, 'project_source_name_short'::varchar as prm_value union all
    select 'cmf3'::varchar as prm_name, 'None'::varchar as prm_value union all
    select 'fact'::varchar as prm_name, 'amount'::varchar as prm_value union all    
    select 'fact_table'::varchar as prm_name, 'dwv_daily_allocation_fact'::varchar as prm_value union all       
    select 'baseline'::varchar as prm_name, 'tmp_nominal_value'::varchar as prm_value union all 
    select 'baseline_table'::varchar as prm_name, 'dw_project'::varchar as prm_value union all
    select 'rqdl'::varchar as prm_name, 'year_distance'::varchar as prm_value 
)

,baseline_data as (
    select 
        cmf1,
        cmf2,
        cmf3,
        sum(coalesce(outval,0)) as baseline
    from 
        bizdata.getColumns(
            (select prm_value from params where prm_name = 'cmf1'::text),
            (select prm_value from params where prm_name = 'cmf2'::text),
            (select prm_value from params where prm_name = 'cmf3'::text),
            (select prm_value from params where prm_name = 'baseline'::text),
            (select prm_value from params where prm_name = 'baseline_table'::text), 
            'None'
            )
    group by
        cmf1,
        cmf2,
        cmf3

)




,fact_data as (
    select 
        cmf1,
        cmf2,
        cmf3,
        rqdl::int as rqdl,
        sum(coalesce(outval,0)) as fact
    from 
        bizdata.getColumns(
            (select prm_value from params where prm_name = 'cmf1'::text),
            (select prm_value from params where prm_name = 'cmf2'::text),
            (select prm_value from params where prm_name = 'cmf3'::text),
            (select prm_value from params where prm_name = 'fact'::text),
            (select prm_value from params where prm_name = 'fact_table'::text),
            (select prm_value from params where prm_name = 'rqdl'::text)
            )
    group by
        cmf1,
        cmf2,
        cmf3,
        rqdl

)

select 
    case when cmf1 = 'None' then null else cmf1 end as cmf1,
    case when cmf2 = 'None' then null else cmf1 end as cmf,
    case when cmf3 = 'None' then null else cmf1 end as cmf1,
    rqdl,
    fact,
    baseline,
    sum(fact) over (partition by cmf1, cmf2, cmf3 order by rqdl) as cfact,
    sum(fact) over (partition by cmf1, cmf2, cmf3 order by rqdl) / baseline as cfactpct
from 
    fact_data 
    join baseline_data using (cmf1, cmf2, cmf3)

您可以看到我最多可以使用3个分组变量(cmf1,cmf2,cmf3)并选择任何距离属性(只要该属性在dwv_daily_allocation_fact中可用。分组变量应该在基线表和事实表中都可用(获得共同的小组水平)

报告

最后一步是在BIRT(2.6.1)中创建报告,其中SQL的params部分中的参数被数据集参数替换并链接到报告参数。那些使用BIRT的人可能理解,其他人必须找到其他方式。

参数选择GUI Parameters selection GUI

输出报告 enter image description here

我仍然需要弄清楚表的正确排序(因此,历史最长的组是第一个。

编辑:我在BIRT交叉表中找到了排序,现在看起来像真正的三角形:

enter image description here

如果您需要有关我如何执行此操作的详细说明,请与我们联系。

答案 1 :(得分:0)

我创建了名为pgvint的R包,其中可以轻松计算复古曲线(径流三角形)。该软件包在github上,目前只支持PostgreSQL作为数据源。

示例输出: enter image description here

enter image description here

此外,还有Shiny应用程序,可以交互式地以不同的布局显示复古数据:

enter image description here