Question

这是我的代码：

// Import io so we can use file objects
import java.io.*;

public class SearchThe {
    public static void main(String args[]) {
        try {
            String stringSearch = "the";
            // Open the file c:\test.txt as a buffered reader
            BufferedReader bf = new BufferedReader(new FileReader("test.txt"));

            // Start a line count and declare a string to hold our current line.
            int linecount = 0;
                String line;

            // Let the user know what we are searching for
            System.out.println("Searching for " + stringSearch + " in file...");

            // Loop through each line, stashing the line into our line variable.
            while (( line = bf.readLine()) != null){
                // Increment the count and find the index of the word
                linecount++;
                int indexfound = line.indexOf(stringSearch);

                // If greater than -1, means we found the word
                if (indexfound > -1) {
                    System.out.println("Word was found at position " + indexfound + " on line " + linecount);
                }
            }

            // Close the file after done searching
            bf.close();
        }
        catch (IOException e) {
            System.out.println("IO Error Occurred: " + e.toString());
        }
    }
}

我想在test.txt文件中找到一些单词“the”。问题是当我找到第一个“the”时，我的程序停止找到更多。

当某些词语如“然后”时，我的程序会将其理解为“the”。

Answer 1

使用Regexes不区分大小写，使用单词边界查找“the”的所有实例和变体。

indexOf("the")无法辨别“the”和“然后”，因为每个都以“the”开头。同样，“the”位于“anathema”的中间。

要避免这种情况，请使用正则表达式，并在两侧搜索带有字边界（\b）的“the”。使用单词边界，而不是在“”上拆分，或仅使用indexOf(" the ")（任意一侧的空格），这些找不到“the。”和标点符号旁边的其他实例。您也可以对搜索案例不敏感地查找“The”。

Pattern p = Pattern.compile("\\bthe\\b", Pattern.CASE_INSENSITIVE);

while ( (line = bf.readLine()) != null) {
    linecount++;

    Matcher m = p.matcher(line);

    // indicate all matches on the line
    while (m.find()) {
        System.out.println("Word was found at position " + 
                       m.start() + " on line " + linecount);
    }
}

Answer 2

您不应该使用indexOf，因为它会找到您的字符串中所有可能的子字符串。因为“then”包含字符串“the”，所以它也是一个很好的子字符串。

More about indexOf

的indexOf

public int indexOf（String str，                      int fromIndex）返回此字符串中的索引   第一次出现的   指定的子字符串，从。开始   指定的索引。返回整数   是最小值k：

你应该将这些行分成许多单词并循环每个单词并与“the”进行比较。

String [] words = line.split(" ");
for (String word : words) {
  if (word.equals("the")) {
    System.out.println("Found the word");
  }
}

上面的代码片段也会循环遍历行中所有可能的“the”。使用indexOf将始终返回第一次出现

Answer 3

您当前的实现只会找到每行的'the'的第一个实例。

考虑将每一行拆分为单词，迭代单词列表，并将每个单词与'the'进行比较：

while (( line = bf.readLine()) != null)
{
    linecount++;
    String[] words = line.split(" ");

    for (String word : words)
    {
        if(word.equals(stringSearch))
            System.out.println("Word was found at position " + indexfound + " on line " + linecount);
    }
}

Answer 4

这听起来不像练习的目的是让你在正则表达式中熟练（我不知道它可能......但它似乎有点基本），即使正则表达式确实是这样的事情的现实解决方案。

我的建议是专注于基础知识，使用索引和子字符串来测试字符串。想想你如何解释字符串的自然区分大小写的本质。此外，你的读者总是被关闭（即有没有办法bf.close（）不会被执行）？

Answer 5

您最好使用Regular Expressions进行此类搜索。作为一个简单/脏的解决方法，您可以从

修改stringSearch

String stringSearch = "the";

到

String stringSearch = " the ";

在.txt文件中查找所有字符串“the”

5 个答案: