不同的行值作为列,在另一个字段上具有group by

时间:2018-06-14 10:20:04

标签: python mysql sql pivot dynamic-pivot

我有贷款数据,我想按日期分组,并通过不同的产品获得金额

我的数据看起来像这样

disbursementdate | amount | product | cluster
2017-01-01       | 1000   | HL      | West
2018-02-01       | 1000   | PL      | East

所以在查询之后,我理想地希望结果看起来像这样

   Month            | HL   | PL
   January 2018     | 1000 | 0
   February 2018    | 100  | 1000

请注意,可能会有更多产品,而且无法知道有多少产品......所以sum case when无效

我正在努力解决查询问题

2 个答案:

答案 0 :(得分:0)

您可以使用Pandas和专用方法pd.DataFrame.pivot_table

import pandas as pd

# read data
df = pd.read_csv('file.csv')

# extract month
df['Month'] = pd.to_datetime(df['disbursementdate']).apply(lambda x: x.replace(day=1))

# pivot results
res = df.pivot_table(index='Month', columns='product', values='amount',
                     aggfunc='sum', fill_value=0).reset_index()

# reformat month
res['Month'] = res['Month'].dt.strftime('%B %Y')

print(res)

product          Month    HL    PL
0         January 2017  1000     0
1        February 2018     0  1000

答案 1 :(得分:0)

你可以在mysql中通过构建代码来执行此操作,例如

DROP TABLE IF EXISTS T;
CREATE TABLE T(disbursementdate DATE, amount INT, product VARCHAR(2), cluster VARCHAR(4));
INSERT INTO T VALUES
('2017-01-01'       , 1000   , 'HL'    ,   'West'),
('2017-01-01'       , 1000   , 'OL'    ,   'West'),
('2018-02-01'       , 1000   , 'PL'    ,   'East'),
('2018-02-01'       , 100    , 'HL'    ,   'West'),
('2018-02-01'       , 1000   , 'HL'    ,   'West');


SET @SQL = 

(SELECT CONCAT('SELECT DISBURSEMENTDATE,',
GROUP_CONCAT(CONCAT('SUM(CASE WHEN PRODUCT = ', CHAR(39),S.PRODUCT, CHAR(39),' THEN AMOUNT ELSE 0 END) AS ',S.PRODUCT))
,' FROM T GROUP BY DISBURSEMENTDATE;')
FROM 
(SELECT DISTINCT PRODUCT FROM T) S
)
;

PREPARE SQLSTMT FROM @SQL;
EXECUTE SQLSTMT;
DEALLOCATE PREPARE SQLSTMT;

+------------------+------+------+------+
| DISBURSEMENTDATE | HL   | OL   | PL   |
+------------------+------+------+------+
| 2017-01-01       | 1000 | 1000 |    0 |
| 2018-02-01       | 1100 |    0 | 1000 |
+------------------+------+------+------+
2 rows in set (0.00 sec)