在水壶中使用发票行拼合发票标题

时间:2017-10-11 10:54:38

标签: pentaho kettle flatten pentaho-data-integration denormalized

如果您的发票标题包​​含多个值(发票号,日期,地点)和未知数量的具有多个值(产品,价格,税)的发票行,是否有办法将此数据展平为一行在发票行数量因发票而异的情况下延伸?

输入示例 -

{"InvoiceRecords": [{
    "InvoiceDate": "8/9/2017 12:00:00 AM",
    "InvoiceLocation": "002",
    "InvoiceNumber": "2004085",
    "InvoiceRecordHeaderDetails": [{
        "InvNum": "2004085",
        "Location": "002",
        "InvDate": "8/9/2017 12:00:00 AM"
    }],
    "InvoiceRecordLineItemDetails": [{
        "UniqueID": "3939934",
        "InvNum": "2004085",
        "LINEITEM": "1",
        "CUSTID": "PREAA",
        "DEPTID": "320306",
        "PRODID": "088856",
        "ProdDesc": "STATE UST",
        "Unitprice": "0.003",
        "QuantShare": "237.5",
        "TaxRate": "7.25",
        "taxamount": "0.05"
    }],
    "InvoiceTaxCodeDetails": [{
        "InvNum": "2004085",
        "LineItem": "1",
        "UniqueID": "34",
        "taxCode": "SALES TAX",
        "taxrate": "7.25",
        "maxtax": "0"
    }]
}]}

我需要同一行中的所有项目(允许在给定的发票记录中有多个项目和/或多个税码项目。

输出示例(注意:下面的“_n”表示未确定的发票行数和税行可能):

{"InvoiceRecords": [{
    "InvoiceDate": "8/9/2017 12:00:00 AM",
    "InvoiceLocation": "002",
    "InvoiceNumber": "2004085",
    "InvoiceRecordHeaderDetailsInvNum": "2004085",
    "InvoiceRecordHeaderDetailsInvNumLocation": "002",
    "InvoiceRecordHeaderDetailsInvNumInvDate": "8/9/2017 12:00:00 AM",
    "InvoiceRecordLineItemDetailsUniqueID_1": "3939934",
    "InvoiceRecordLineItemDetailsInvNum_1": "2004085",
    "InvoiceRecordLineItemDetailsLINEITEM_1": "1",
    "InvoiceRecordLineItemDetailsCUSTID_1": "PREAA",
    "InvoiceRecordLineItemDetailsDEPTID_1": "320306",
    "InvoiceRecordLineItemDetailsPRODID_1": "088856",
    "InvoiceRecordLineItemDetailsProdDesc_1": "STATE UST",
    "InvoiceRecordLineItemDetailsUnitprice_1": "0.003",
    "InvoiceRecordLineItemDetailsQuantShare_1": "237.5",
    "InvoiceRecordLineItemDetailsTaxRate_1": "7.25",
    "InvoiceRecordLineItemDetailstaxamount_1": "0.05",
    "InvoiceTaxCodeDetailsInvNum_1": "2004085",
    "InvoiceTaxCodeDetailsLineItem_1": "1",
    "InvoiceTaxCodeDetailsUniqueID_1": "34",
    "InvoiceTaxCodeDetailstaxCode_1": "SALES TAX",
    "InvoiceTaxCodeDetailstaxrate_1": "7.25",
    "InvoiceTaxCodeDetailsmaxtax_1": "0",
    "InvoiceRecordLineItemDetailsUniqueID_n": "3939934",
    "InvoiceRecordLineItemDetailsInvNum_n": "2004085",
    "InvoiceRecordLineItemDetailsLINEITEM_n": "1",
    "InvoiceRecordLineItemDetailsCUSTID_n": "PREAA",
    "InvoiceRecordLineItemDetailsDEPTID_n": "320306",
    "InvoiceRecordLineItemDetailsPRODID_n": "088856",
    "InvoiceRecordLineItemDetailsProdDesc_n": "STATE UST",
    "InvoiceRecordLineItemDetailsUnitprice_n": "0.003",
    "InvoiceRecordLineItemDetailsQuantShare_n": "237.5",
    "InvoiceRecordLineItemDetailsTaxRate_n": "7.25",
    "InvoiceRecordLineItemDetailstaxamount_n": "0.05",
    "InvoiceTaxCodeDetailsInvNum_n": "2004085",
    "InvoiceTaxCodeDetailsLineItem_n": "1",
    "InvoiceTaxCodeDetailsUniqueID_n": "34",
    "InvoiceTaxCodeDetailstaxCode_n": "SALES TAX",
    "InvoiceTaxCodeDetailstaxrate_n": "7.25",
    "InvoiceTaxCodeDetailsmaxtax_n": "0"
}]}

谢谢!

1 个答案:

答案 0 :(得分:0)

你在spoon.bat附近的samples目录中有一个类似问题的例子。看看samples/transformation/XML Add并在第一个choc中幸存下来:他们做了一些更复杂的事情,只是为了展示所有可能的东西。

在您的情况下,使用Switch/Case分割,标题中的输入流,项目并设置为在每个上保留InvoiceNumber(稍后会详细介绍)。将三个流转换为JSON(使用JSON输出,或者更简单,使用Javascript)。然后通过InvoiceNumber Group by项目。{通过InvoiceNumber加入三个流,我建议在标题流中使用lookup stream,然后在页脚流中添加另一个lookup stream。使用其他javascript并将数据视为字符串,您可以使用{header,[item],footer}格式构建JSON行,您可以Group by使用串联只有一行

有些工作,但相当标准,除了在项目和页脚上获取InvoiceNumber的棘手部分,因为它们已从流程中消失。为此,您可以使用javascript保留值的事实,除非重新定义。添加一个新的启动脚本[右键单击选项卡顶部的Script1,添加一个副本,右键单击刚刚创建的Script1_0,并将其定义为Start script]。

在此启动脚本上:

var PrevInvoiceNumber = -1;

在主脚本上:

if(InvoiceNumber && PrevInvoiceNumber!=InvoiceNumber)
    PrevInvoiceNumber = InvoiceNumber

然后你应该看到每行上的数据PrevInvoiceNumber等于发票的预期InvoiceNumber。enter image description here