具有多个值列的数据透视表

时间:2019-03-22 09:18:33

标签: sql postgresql pivot crosstab

我有一个Postgres表,其中包含来自不同制造商的产品数据,这里是简化的表结构:

CREATE TABLE test_table (
  sku               text,
  manufacturer_name text,
  price             double precision,
  stock             int
);

INSERT INTO test_table
VALUES ('sku1', 'Manufacturer1', 110.00, 22),
       ('sku1', 'Manufacturer2', 120.00, 15),
       ('sku1', 'Manufacturer3', 130.00, 1),
       ('sku1', 'Manufacturer3', 30.00, 11),
       ('sku2', 'Manufacturer1', 10.00, 2),
       ('sku2', 'Manufacturer2', 9.00,  3),
       ('sku3', 'Manufacturer2', 21.00, 3),
       ('sku3', 'Manufacturer2', 1.00, 7),
       ('sku3', 'Manufacturer3', 19.00, 5);

我需要为每个SKU输出每个制造商,但是如果同一SKU有多个相同的制造商,那么我需要选择价格最低的制造商(请注意,我还需要包括“库存”列),此处是理想的结果:

| sku  | man1_price | man1_stock | man2_price | man2_stock | man3_price | man3_stock |
|------|------------|------------|------------|------------|------------|------------|
| sku1 | 110.0      | 22         | 120.0      | 15         | 30.0       | 11         |
| sku2 | 10.0       | 2          | 9.0        | 3          |            |            |
| sku3 |            |            | 1.0        | 7          | 19.0       | 5          |

我尝试使用Postgres crosstab()

SELECT *
FROM crosstab('SELECT sku, manufacturer_name, price
              FROM test_table
              ORDER BY 1,2',
              $$ SELECT DISTINCT manufacturer_name FROM test_table ORDER BY 1 $$
       )
       AS ct (sku text, "man1_price" double precision,
              "man2_price" double precision,
              "man3_price" double precision
    );

但是这将产生一个只有一个price列的表。而且我没有找到包含stock列的方法。

我还尝试使用条件聚合:

SELECT sku,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer1' THEN price END) as man1_price,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer1' THEN stock END) as man1_stock,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer2' THEN price END) as man2_price,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer2' THEN stock END) as man2_stock,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer3' THEN price END) as man3_price,
   MIN(CASE WHEN manufacturer_name = 'Manufacturer3' THEN stock END) as man3_stock
FROM test_table
GROUP BY sku
ORDER BY sku

在我的情况下,该查询也不起作用-它仅选择最低库存水平-但是,如果相同sku的相同制造商很少,但价格/库存不同,则此查询从一个制造商中选择最低价格,最小库存量。

如何从该表中输出每个制造商的price和相应的stock

P.S。谢谢大家这么有用的答案。 我的Postgres表很小-产品总数不超过15,000((我不知道这样的数字是否可以用于正确比较),但是由于Erwin Brandstetter要求比较不同的查询性能,因此我用{{1}进行了3次查询},这是他们的执行时间:

EXPLAIN ANALYZE

再次-我不确定这些数字是否可用作参考。就我而言,我选择了Erwin Brandstetter query: 400 - 450 ms Kjetil S query: 250 - 300 ms Gordon Linoff query: 200 - 250 ms a_horse_with_no_name query: 250 - 300 ms Kjetil S查询的组合版本,但是Gordon LinoffErwin Brandstetter的变体也非常有用和有趣。  值得一提的是,如果将来我的表最终会拥有更多的制造商,那么-每次都要调整查询并键入他们的名字将很烦人-因此来自a_horse_with_no_name答案的查询将是最方便使用的查询。

4 个答案:

答案 0 :(得分:2)

您最后选择的几乎有效。但是,您应该添加一个where条件,以删除每个制造商每sku的非最低价格行。这样会产生预期的结果:

select
  sku,
  min( case when manufacturer_name='Manufacturer1' then price end ) man1_price,
  min( case when manufacturer_name='Manufacturer1' then stock end ) man1_stock,
  min( case when manufacturer_name='Manufacturer2' then price end ) man2_price,
  min( case when manufacturer_name='Manufacturer2' then stock end ) man2_stock,
  min( case when manufacturer_name='Manufacturer3' then price end ) man3_price,
  min( case when manufacturer_name='Manufacturer3' then stock end ) man3_stock
from test_table t
where not exists (
    select 1 from test_table
    where sku=t.sku
    and manufacturer_name=t.manufacturer_name
    and price<t.price
)
group by sku
order by 1;

答案 1 :(得分:1)

这些天,我发现使用JSON结果要容易得多,而使用复杂的枢轴则容易得多。产生单个聚合的JSON值并没有打破SQL的固有限制,即执行查询之前必须知道列数(并且所有行都必须相同)。

您可以使用类似这样的内容:

select sku, 
       jsonb_object_agg(manufacturer_name, 
                          jsonb_build_object('price', price, 'stock', stock, 'isMinPrice', price = min_price)) as price_info
from (
  select sku, 
         manufacturer_name,
         price, 
         min(price) over (partition by sku) as min_price,
         stock
  from test_table
) t
group by sku;

以上内容使用您的示例数据返回了以下结果:

sku  | price_info                                                                                                                                                                                             
-----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sku1 | {"Manufacturer1": {"price": 110, "stock": 22, "isMinPrice": false}, "Manufacturer2": {"price": 120, "stock": 15, "isMinPrice": false}, "Manufacturer3": {"price": 30, "stock": 11, "isMinPrice": true}}
sku2 | {"Manufacturer1": {"price": 10, "stock": 2, "isMinPrice": false}, "Manufacturer2": {"price": 9, "stock": 3, "isMinPrice": true}}                                                                       
sku3 | {"Manufacturer2": {"price": 1, "stock": 7, "isMinPrice": true}, "Manufacturer3": {"price": 19, "stock": 5, "isMinPrice": false}}                                                                       

答案 2 :(得分:1)

我将使用base.lib(base.thread_local_storage.obj):-1: error: LNK2038: mismatch detected for 'RuntimeLibrary': value 'MT_StaticRelease' doesn't match value 'MD_DynamicRelease' in Updater.obj 将数据限制为一个制造商以一个价格出售。我喜欢Postgres中的distinct on功能。所以:

filter

答案 3 :(得分:0)

crosstab()必须提供 static 列定义列表。您的第二个参数:

$$ SELECT DISTINCT manufacturer_name FROM test_table ORDER BY 1 $$

...提供了需要 dynamic 列定义列表的 dynamic 值列表。那是行不通的-除了发生率之外。

您的任务的核心问题是crosstab()在查询的第一个参数中期望 single 值列。但是您想每行处理{strong> 两个值列pricestock)。

解决此问题的一种方法是将多个值打包为复合类型,然后在外部的SELECT中提取值。

一次创建一个复合类型:

CREATE TYPE price_stock AS (price float8, stock int);

临时表或视图也可以达到目的。
然后:

SELECT sku
     , (man1).price, (man1).stock
     , (man2).price, (man2).stock
     , (man3).price, (man3).stock
FROM   crosstab(
   'SELECT sku, manufacturer_name, (price, stock)::price_stock
    FROM   test_table
    ORDER  BY 1,2'
  , $$VALUES ('Manufacturer1'),('Manufacturer2'),('Manufacturer3')$$
    )
       AS ct (sku text
            , man1 price_stock
            , man2 price_stock
            , man3 price_stock
    );

为了进行快速测试,或者如果基础表的行不太宽,您也可以只使用其行类型,而无需创建自定义类型:

SELECT sku
     , (man1).price, (man1).stock
     , (man2).price, (man2).stock
     , (man3).price, (man3).stock
FROM   crosstab(
   'SELECT sku, manufacturer_name, t
    FROM   test_table t
    ORDER  BY 1,2'
  , $$VALUES ('Manufacturer1'),('Manufacturer2'),('Manufacturer3')$$
    )
       AS ct (sku text
            , man1 test_table
            , man2 test_table
            , man3 test_table
    );

db <>提琴here

相关:

相关问题