Question

假设我的数据如下：

Acct_id | amount
--------|-------
10001   |6.00
20000   |5.00
32356   |1.00
10001   |2.00
45000   |1.50
45000   |10.00

我的预期结果应该是这样的：

acct_id| count
-------|-----
10001  | 2
45000  | 2

我如何在cassandra中获得它？

Answer 1

我如何在cassandra中获得它？

如果您使用 Cassandra 2.2.x 或 3.x ，则可以创建用户定义的汇总

CREATE FUNCTION counByAccId(state map<int, int>, acctid int)
RETURNS NULL ON NULL INPUT
RETURNS map<int, int>
LANGUAGE java
AS '
if(state.containsKey(acctid)) {
   Integer currentCount = (Integer)state.get(acctid);
   state.put(acctid, currentCount + 1);
} else {
   state.put(acctid, 1);
}
return state;
';

CREATE AGGREGATE groupByAcctIdAndCount(int)
SFUNC counByAccId
STYPE map<int, int>
INITCOND {};

SELECT groupByAcctIdAndCount(acct_id) FROM myTable WHERE partition_key = xxx;

示例数据集：

select * from agg;

partition_key | acct_id | val
---------------+---------+-----
             5 |   45000 | 1.5
             1 |   10001 |   6
             2 |   20000 |   5
             4 |   10001 |   2
             6 |   45000 |  10
             3 |   32356 |   1

select groupByAcctIdAndCount(acctid) FROM agg;

 music.groupbyacctidandcount(acct_id)
------------------------------------------
 {10001: 2, 20000: 1, 32356: 1, 45000: 2}

警告：请务必阅读我的博客，了解UDA以及扫描完整表时的暗示效果：http://www.doanduyhai.com/blog/?p=2015

如何计算cassandra中表中的重复记录

1 个答案: