将Pig关系中的数据存储到Cassandra中

时间:2014-05-19 15:18:30

标签: cassandra apache-pig datastax-enterprise

我有以下Cassandra表:

CREATE TABLE segments (
  b text,
  s int,
  c int,
  PRIMARY KEY (b)
)

和以下Pig关系:

data: {b: chararray,s: long,c: long}

我从存储在PigStorage中的文件加载

data = LOAD 'some_file' as (b:chararray,s:long,c:long);

我试图将Pig关系存储到Cassandra表中失败。我试过了:

to_cassandra = FOREACH (GROUP data ALL) 
  GENERATE 
    TOTUPLE(TOTUPLE('b',data.b)),
    TOTUPLE('s',data.s),
    TOTUPLE('c',data.c);
STORE to_cassandra INTO 
  'cql://pv/segments?
    output_query=UPDATE%20pv.segments%20SET%20s%3D%3F%2Cc%3D%3F'
  USING CqlStorage();

其中解码输出查询为:

UPDATE pv.segments SET s=?,c=?

但我得到以下内容:

[main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - 
  ERROR: java.lang.ClassCastException: 
    org.apache.pig.data.DefaultDataBag cannot be cast to org.apache.pig.data.DataByteArray

这是一种神秘的。哪一个是违法的领域?我该如何解决这个问题?

修改

我跑了illustrate to_cassandra;并得到了:

-----------------------------------------------------------------------------------------------------
| data     | b:chararray                                                  | s:long     | c:long     | 
-----------------------------------------------------------------------------------------------------
|          | 03Wat7NfMi8QiE4IlHeTmbOEfLNkvlzfG5znff62KvSzpm09eTBWCxcdotuB | 1          | 1          | 
|          | 0qadR3YpgVEwYsORBHFMfAh4OFk7IrROyCq7RDibchBpAKfSWAjOHDAyfzPG | 1          | 1          | 
-----------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1-3     | group:chararray     | data:bag{:tuple(b:chararray,s:long,c:long)}                                                                                                  | 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|         | all                 | {(03Wat7NfMi8QiE4IlHeTmbOEfLNkvlzfG5znff62KvSzpm09eTBWCxcdotuB, 1, 1), (0qadR3YpgVEwYsORBHFMfAh4OFk7IrROyCq7RDibchBpAKfSWAjOHDAyfzPG, 1, 1)} | 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| to_cassandra     | org.apache.pig.builtin.totuple_org.apache.pig.builtin.totuple_29_30:tuple(org.apache.pig.builtin.totuple_29:tuple(:chararray,:bag{:tuple(b:chararray)}))                         | org.apache.pig.builtin.totuple_31:tuple(:chararray,:bag{:tuple(s:long)})                     | org.apache.pig.builtin.totuple_32:tuple(:chararray,:bag{:tuple(c:long)})                     | 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|                  | ((b, {(03Wat7NfMi8QiE4IlHeTmbOEfLNkvlzfG5znff62KvSzpm09eTBWCxcdotuB), (0qadR3YpgVEwYsORBHFMfAh4OFk7IrROyCq7RDibchBpAKfSWAjOHDAyfzPG)}))                                          | (s, {(1), (1)})                                                                              | (c, {(1), (1)})                                                                              | 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

1 个答案:

答案 0 :(得分:0)

您的分组存在问题,因为它为每个字段而不是单个值生成数组,这正是Cassandra所期望的。您的输出最终应如下所示:

((b, 03Wat7NfMi8QiE4IlHeTmbOEfLNkvlzfG5znff62KvSzpm09eTBWCxcdotuB)), (s, 1), (c, 1)

...以匹配您的架构。由于输出模式直接与您的输入匹配,因此分组的目的不明确。