N-Gram with ArrayList

时间:2016-02-24 19:35:50

标签: java n-gram collocation

我正在接受一个我正在分析'ngrams'的项目。我的程序中有一个创建bigrams和trigrams的方法。但是,他们只能将连续的相邻单词放在一起,我希望它能获得所有单词组合......

例如,

 Original String - "chilli, start, day, suffer, raynaud, check, raynaudsuk, great, tip, loveyourglov, ram"
 Bigram - "chilli start, start day, day suffer, suffer raynaud, raynaud check, check raynaudsuk, raynaudsuk great, great tip, tip loveyourglov, loveyourglov ram"

但我想让它得到String中所有单词的组合。例如

Expected Bigram - "chilli start,1, chilli day,2, chilli suffer,3, chilli raynaud,4, chilli check,5, chilli raynaudsuk,6, chilli great,7, chilli tip,8, chilli loveyourglov,9, chilli ram,10, start day,1, etc..."

如何修改我的方法以生成这样的二元组?

public ArrayList<String> bigramList;
ArrayList<String> fullBagOfWords = new ArrayList<String>();


public void bigramCreator(){
    int i = 0;
    bigramList = new ArrayList<String>();
    for(String bi : fullBagOfWords){
        int n = 2;
        if (i <= fullBagOfWords.size() - n) {
            String bigram = "";
            for (int j = 0; j < n-1; j++)
            bigram += fullBagOfWords.get(i + j) + " ";
            bigram += fullBagOfWords.get(i + n - 1);
            bigramList.add(bigram);
            i++;
        }
    }
}

非常感谢你给予的任何帮助。

1 个答案:

答案 0 :(得分:0)

如果我理解正确的任务,那应该很简单

for (int i = 0; i < fullBagOfWords.size() - 1; i++) {
    for (int j = i + 1; j < fullBagOfWords.size(); j++) {
        bigramList.add(fullBagOfWords.get(i) + " " + fullBagOfWords.get(j) + ", " + (j - i));
    }
}