删除File传递的JAVA中的StopWords

时间:2019-07-02 11:14:14

标签: java arrays string char

我必须从txt文件中获取一些StopWord并将其从文本中删除。 我使用这种方法从File中获取StopWords,将它们保存在String数组中并返回:

public String[] loadStopwords(File targetFile, String[] stopWords) throws IOException {

    File fileTo = new File(targetFile.toString());
    BufferedReader br;
    List<String> lines = new ArrayList<String>();

    try {
            br = new BufferedReader(new FileReader(fileTo));
            String st;
                while((st=br.readLine()) != null){
                    lines.add(st);
                }
    } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    stopWords = lines.toArray(new String[]{});
    return stopWords;

}

然后,我传递StopWords []和要在其中更新的文本:

public void removeStopWords(String targetText, String[] stopwords) {
    targetText = targetText.toLowerCase().trim();

    ArrayList<String> wordList = new ArrayList<>();
    wordList.addAll(Arrays.asList(targetText.split(" ")));

    List<String> stopWordsList = new ArrayList<>();
    stopWordsList.addAll(Arrays.asList(stopwords));

    wordList.removeAll(stopWordsList);

}

但不会从 wordList 中删除任何内容。为什么?

2 个答案:

答案 0 :(得分:1)

尝试也将停用词保存为小写

public  String[] loadStopwords(String targetFile) throws IOException {
    File fileTo = new File(targetFile);
    BufferedReader br;
    List<String> lines = new ArrayList<>();
    try {
        br = new BufferedReader(new FileReader(fileTo));
        String st;
        while((st=br.readLine()) != null){
            //Adding words en lowercase and without start end blanks
            lines.add(st.toLowerCase().trim);
        }
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }

    return lines.toArray(new String[]{});
}

public  ArrayList<String> removeStopWords(String targetText, String[] stopwords) {
    //Make the text to LowerCase also
    targetText = targetText.toLowerCase().trim();

    ArrayList<String> wordList = new ArrayList<>();
    wordList.addAll(Arrays.asList(targetText.split(" ")));

    List<String> stopWordsList = new ArrayList<>();
    stopWordsList.addAll(Arrays.asList(stopwords));

    wordList.removeAll(stopWordsList);

    return wordList;
}

答案 1 :(得分:0)

Edoardo

那确实对我有用。但是,有一些评论:

  1. 您不要在loadStopWords方法中使用stopWords参数。
  2. 您不会从removeStopWords方法返回wordList。

查看您的评论,我怀疑区别在于停用词文本文件。我让我的每个停用词都换行了,而您很可能将所有停用词都放在了一行上,而您并没有将它们分开。