如果某些字段为空且来自不同列的相关值,则更新表

时间:2016-07-06 11:10:29

标签: mysql kettle

我正在用pentaho水壶编写一个ETL来创建一个包含谷歌分析的各种来源的表格。

所以表1 =来自网站的所有数据都加入了Google分析信息    表2 =表1中的所有重复数据都加入了Google分析信息

我的问题是,表1中的某些信息缺少Google分析信息,但表2显示了同一reference_number上Google Analytics的一些数据

所以我想做的是从表1到表2查找[reference_number]并填充表1,其中某些列从表2的信息中为空

快速示例编辑*

git branch -r | awk '{ remote = substr($1, 0, index($1, "/") - 1); branch = substr($1, index($1, "/") + 1) } branch == "HEAD" { next } remote != lastRemote { printf "%s%s:", lastRemote ? "\n" : "", remote; lastRemote = remote; firstBranch = 1; } !firstBranch { printf "," } firstBranch { firstBranch = 0 } { printf branch } END { print "" }'

我的输出应该是以下

Table 1 (Main Table) * *This table has an index built in on website_reference number (Unique)*
  website_Reference_number   GA_info_1   GA_info_2 
  A1              null       null
  A2               x           y

Table 2 (Duplicates from Table 1)           
  eventlabel   GA_info_1   GA_info_2
  A1               z            z
  A2               x            y

我使用的是My_SQL数据库

2 个答案:

答案 0 :(得分:0)

UPDATE mytable
LEFT JOIN table2 ON mytable.Ref_number = table2.Ref_number
SET mytable.GA_info_1 = COALESCE (
    mytable.GA_info_1,
    table2.GA_info_1
),
 mytable.GA_info_2 = COALESCE (
    mytable.GA_info_2,
    table2.GA_info_2
)
WHERE
    mytable.GA_info_1 IS NULL
OR mytable.GA_info_2 IS NULL

将可能为null的所有字段放入where子句中。

如果该字段不为null,则不会更新,因为它是coalesce函数中的第一个参数,如果它为null,则将由另一个表的字段更新。

修改:您也可以这样试试:

UPDATE mytable
INNER JOIN table2 ON mytable.Ref_number = table2.Ref_number
SET mytable.GA_info_1 = COALESCE (
    mytable.GA_info_1,
    table2.GA_info_1
),
 mytable.GA_info_2 = COALESCE (
    mytable.GA_info_2,
    table2.GA_info_2
)
WHERE
    CONCAT(mytable.GA_info_1, mytable.GA_info_2) IS NULL

针对性能问题 :(已在评论中提及)

由于您没有使用主键或外键来连接表,因此您必须在Ref_number列上设置索引以加快连接速度。

答案 1 :(得分:0)

    UPDATE DIM_ENQUIRIES_TEST
LEFT JOIN DIM_ENQUIRIES_TEST AS STAGING_GA ON DIM_ENQUIRIES_TEST.website_reference_number = STAGING_GA.eventlabel
SET DIM_ENQUIRIES_TEST.eventlabel = COALESCE (
DIM_ENQUIRIES_TEST.eventlabel,
STAGING_GA.eventlabel
),
DIM_ENQUIRIES_TEST.sourcemedium = COALESCE (
DIM_ENQUIRIES_TEST.sourcemedium,
STAGING_GA.sourcemedium
)
,
DIM_ENQUIRIES_TEST.deviceCategory = COALESCE (
DIM_ENQUIRIES_TEST.deviceCategory,
STAGING_GA.deviceCategory
)
,
DIM_ENQUIRIES_TEST.avgSessionDuration = COALESCE (
DIM_ENQUIRIES_TEST.avgSessionDuration,
STAGING_GA.avgSessionDuration
)
,
DIM_ENQUIRIES_TEST.timeonpage = COALESCE (
DIM_ENQUIRIES_TEST.timeonpage,
STAGING_GA.timeonpage
)
,
DIM_ENQUIRIES_TEST.avgtimeonpage = COALESCE (
DIM_ENQUIRIES_TEST.avgtimeonpage,
STAGING_GA.avgtimeonpage
)
,
DIM_ENQUIRIES_TEST.bouncerate = COALESCE (
DIM_ENQUIRIES_TEST.bouncerate,
STAGING_GA.bouncerate
)
,
DIM_ENQUIRIES_TEST.profileid = COALESCE (
DIM_ENQUIRIES_TEST.profileid,
STAGING_GA.profileid
)
,
DIM_ENQUIRIES_TEST.webpropertyid = COALESCE (
DIM_ENQUIRIES_TEST.webpropertyid,
STAGING_GA.webpropertyid
)
,
DIM_ENQUIRIES_TEST.accountname = COALESCE (
DIM_ENQUIRIES_TEST.accountname,
STAGING_GA.accountname
)
,
DIM_ENQUIRIES_TEST.tableid = COALESCE (
DIM_ENQUIRIES_TEST.tableid,
STAGING_GA.tableid
)
,
DIM_ENQUIRIES_TEST.tablename = COALESCE (
DIM_ENQUIRIES_TEST.tablename,
STAGING_GA.tablename
)
,
DIM_ENQUIRIES_TEST.keyword = COALESCE (
DIM_ENQUIRIES_TEST.keyword,
STAGING_GA.keyword
)
,
DIM_ENQUIRIES_TEST.country = COALESCE (
DIM_ENQUIRIES_TEST.country,
STAGING_GA.country
)
,
DIM_ENQUIRIES_TEST.campaign = COALESCE (
DIM_ENQUIRIES_TEST.campaign,
STAGING_GA.campaign
)
,
DIM_ENQUIRIES_TEST.sessions = COALESCE (
DIM_ENQUIRIES_TEST.sessions,
STAGING_GA.sessions
)
,
DIM_ENQUIRIES_TEST.sessionduration = COALESCE (
DIM_ENQUIRIES_TEST.sessionduration,
STAGING_GA.sessionduration
)
,
DIM_ENQUIRIES_TEST.bounces = COALESCE (
DIM_ENQUIRIES_TEST.bounces,
STAGING_GA.bounces
)
WHERE
DIM_ENQUIRIES_TEST.EventLabel IS NULL
OR DIM_ENQUIRIES_TEST.SourceMedium IS NULL
;

- 我只检查一个,因为如果其中一个为null,则可能需要更改的其余列也为null