Pig UDF类转换异常分布式缓存

时间:2014-04-03 07:31:11

标签: apache-pig

这是Pig实施中的分布式缓存:

public class Regex extends EvalFunc<String> {
    static HashMap<String, String> map = new HashMap<String, String>();

    public List<String> getCacheFiles() {
        Path lookup_file = new Path(
                "hdfs://localhost.localdomain:8020/user/cloudera/top");
        List<String> list = new ArrayList<String>(1);
        list.add(lookup_file + "#id_lookup");
        return list;
    }

    public void VectorizeData() throws IOException {
        FileReader fr = new FileReader("./id_lookup");
        BufferedReader brd = new BufferedReader(fr);
        String line;
        while ((line = brd.readLine()) != null) {
            String str[] = line.split("#");
            map.put(str[0], str[1]);
        }
        fr.close();
    }

    private String Regex(Tuple input) throws ExecException {
        // TODO Auto-generated method stub
        String tweet = (String) input.get(0);
        for (Entry<String, String> entry : map.entrySet()) {
            Pattern r = Pattern.compile(map.get(entry.getKey()));
            Matcher m = r.matcher(tweet);
            System.out.println(m.find());
            System.out.println(m.pattern());
            if (m.find() == true) {
                return entry.getValue();
            }
        }

        return null;
    }

    @Override
    public String exec(Tuple input) throws IOException {
        VectorizeData();

        return Regex(input);
    }

}

以下是运行此UDF后的错误。 这主要与哈希映射有关

java.lang.ClassCastException: java.util.HashMap cannot be cast to java.lang.String
    at UDF.Regex.Regex(Regex.java:47)
    at UDF.Regex.exec(Regex.java:70)
    at UDF.Regex.exec(Regex.java:1)

hashmap返回的大小为3表示已填充。 请帮助解决类强制转换异常

0 个答案:

没有答案