Question

我有以下查询：

SELECT count(distinct document_key), etl_telco_cycle.customer_number FROM telco_document_header inner join etl_telco_cycle on  (telco_document_header.customer_number like '%' || etl_telco_cycle.customer_number) where telco_document_header.document_cycle = substring(cast(now() - interval '1 month' as varchar) from 1 for 4) || substring(cast(now() - interval '1 month' as varchar) from 6 for 2) and telco_document_header.customer_number like '%' || etl_telco_cycle.customer_number) group by etl_telco_cycle.customer_number

返回以下内容：

现在，我想使用该结果来更新customer_number匹配的表中的计数。我尝试过：

update etl_telco_cycle set amount_mobilephone_numbers = (SELECT count(distinct document_key), etl_telco_cycle.customer_number FROM telco_document_header inner join etl_telco_cycle on  (telco_document_header.customer_number like '%' || etl_telco_cycle.customer_number) where telco_document_header.document_cycle = substring(cast(now() - interval '1 month' as varchar) from 1 for 4) || substring(cast(now() - interval '1 month' as varchar) from 6 for 2) group by etl_telco_cycle.customer_number)

结果如下：

Answer 1

使用FROM clause to the UPDATE command：

UPDATE etl_telco_cycle e
SET    amount_mobilephone_numbers = c.ct
FROM  (
   SELECT e.customer_number, count(distinct document_key) AS ct
   FROM   telco_document_header t
   JOIN   etl_telco_cycle       e ON  t.customer_number like '%' || e.customer_number
   WHERE  t.document_cycle = substring(cast(now() - interval '1 month' as varchar) from 1 for 4)
                          || substring(cast(now() - interval '1 month' as varchar) from 6 for 2)
   GROUP  BY 1
   ) c
WHERE e.customer_number = c.customer_number
AND   e.amount_mobilephone_numbers IS DISTINCT FROM c.ct;  --optional optimization

虽然您也可以使用相关子查询，但是通常会慢得多，每个目标行运行一个聚合查询，而此查询运行一个单个聚合查询。并且有一个细微的差别：如果在诸如Gordon demonstrates之类的相关子查询中未找到相关行，则该列仍将更新为NULL（对于定义为NOT NULL的列将失败），而我的查询则执行< em> nothing （保留旧值）。您必须定义所需的行为。

添加的AND e.amount_mobilephone_numbers IS DISTINCT FROM c.ct防止空更新。相关：

How do I (or can I) SELECT DISTINCT on multiple columns?

您可以进一步优化计数子查询的性能。您可能不需要子查询中的DISTINCT或JOIN-都需要查看确切的表定义和约束。看来您可以用以下任何一种方式替换：

   substring(cast(now() - interval '1 month' as varchar) from 1 for 4)
|| substring(cast(now() - interval '1 month' as varchar) from 6 for 2)

具有：

to_char(now() - interval '1 month', 'YYYYMM')

这两种情况都取决于当前的timezone设置，在极端情况下这可能是不可取的。

并且document_cycle应该是date或integer，而不是字符串类型...

Answer 2

您可以只使用相关子查询：

update etl_telco_cycle
    set amount_mobilephone_numbers = (SELECT count(distinct document_key)
                                      FROM telco_document_header tdh
                                      WHERE tdh.customer_number = etl_telco_cycle.customer_number AND 
                                            tdh.document_cycle = substring(cast(now() - interval '1 month' as varchar) from 1 for 4) || substring(cast(now() - interval '1 month' as varchar) from 6 for 2) 
                                    );

我不确定您的版本为什么使用LIKE来匹配客户编号。这似乎很尴尬，所以我将其删除。

我还认为使用TO_CHAR()可以更简洁地编写日期逻辑：

update etl_telco_cycle
    set amount_mobilephone_numbers = (SELECT count(distinct document_key)
                                      FROM telco_document_header tdh
                                      WHERE tdh.customer_number = etl_telco_cycle.customer_number AND 
                                            tdh.document_cycle = TO_CHAR(now() - interval '1 month', 'YYYYDD')
                                     );

从计数子查询的结果更新列

2 个答案: