计算字符串中的单词

时间:2016-11-09 03:15:50

标签: java arrays regex string

我应该创建一个方法来计算句子中满足或超过int minLength的单词数。例如,如果给定的最小长度为4,则您的程序应该只计算至少4个字母长的单词。

单词可以用一个或多个空格分隔。可能存在非字母字符(空格,标点符号,数字等),但它们不会计入单词的长度。

    public static int countWords(String original, int minLength) {
    original = original.replaceAll("[^A-Za-z\\s]", "").replaceAll("[0-9]", "");
    String[] words = original.split("\\s+");


    for(String word : words){ System.out.println(word); }

    int count = 0;
    for (int i = 0; i < words.length; i++) {
        if (words[i].length() >= minLength) {
            count++;
        } else if (words[i].length() < minLength || minLength == 0) {
            count = 0;
        }
    }
    System.out.println("Number of words in sentence: " + count);
    return count;
}

好的,所以我改变了我的代码,但现在计数器已经关闭了。我说输入以下内容:西班牙是一个美丽的国家;小胡子温暖,沙质,一尘不染。&#34;

我收到的输出是...... 西班牙 是 一个 美丽 国家 该 海滩 是 暖 沙 和 一尘不染 清洁 句子中的单词数:10

单词的数量是一个,它应该是11.看起来它不计算句子中的最后一个单词。我不知道问题出在哪里,因为我只是将replaceAll更改为包含转义字符。

4 个答案:

答案 0 :(得分:2)

您收到的结果不正确,因为在else if条件中,count更新为0。 因此,只要出现长度为&lt; minLength,你的计数器重置。 你可以删除else if条件,这应该修复你的代码。

此外,以下是另外2个选项来编写相同的代码,并提供必要的注释以了解每个步骤中发生的情况。

选项1:

    public void onCreate(SQLiteDatabase db) {
        this.createTables(db); // do the normal onCreate() stuff

        db.setTransactionSuccessful();
        db.endTransaction();  // end transaction, so that you can do attach

        db.execSQL("ATTACH DATABASE '/databases/xx.db' AS xx;"); // do attach database
        db.execSQL("DETACH DATABASE xx;"); // detach database

        db.beginTransaction(); // begin transation again
    }

选项2:[使用Java 8流]

private static long countWords(final String sentence, final int minLength) {
  // Validate the input sentence is not null or empty.
  if (sentence == null || sentence.isEmpty()) {
    return 0;
  }

  long count = 0;
  // split the sentence by spaces to get array of words.
  final String[] words = sentence.split(" ");
  for (final String word : words) { // for each word
    // remove unwanted characters from the word.
    final String normalizedWord = word.trim().replaceAll("[^a-zA-Z0-9]", "");
    // if the length of word is greater than or equal to minLength provided, increment the counter.
    if (normalizedWord.length() >= minLength) {
      count++;
    }
  }

  return count;
}

输入字符串:“西班牙是一个美丽的国家;海滩是温暖的,沙质的,一尘不染的。”

Min Length: 3. Output: 11
Min Length: 4. Output: 8
Min Length: 5. Output: 7

对于输入字符串:“这就像魔法一样!”

Min Length: 4. Output: 5
Min Length: 5. Output: 2
Min Length: 6. Output: 0

输入字符串:“hello $ hello”

Min Length: 4. Output: 1
Min Length: 5. Output: 1
Min Length: 6. Output: 1

答案 1 :(得分:0)

1)按空格分割

2)修剪以删除多余的空格并用“”(删除)

替换所有奇怪的内容

3)计算多于或等于你的minLength的单词

实施例

public class TesterClass
{
    public static void main (String args [])
    {
            String original = ",,, hello$hello asdasda ddd 33d   3333d        a";
            int minLength = 3;
            String[] words = original.split(" ");
            int count=0;

            for( String trimAndNoStrange : words)
            {
                String fixed = trimAndNoStrange.trim ( ).replaceAll("[^A-Za-z]", "").replaceAll("[0-9]", "");
                if(fixed.length ( ) >= minLength)
                {
                    count++;
                }
            }


            System.out.println("Number of words in sentence: " + count);

        }

}

输入/输出示例:

  

输入:“,,,你好$ hello asdasda ddd 33d 3333d a”

     

输入:minLength = 3;

     

输出:句子中的单词数:3

答案 2 :(得分:0)

尝试将代码更新到下面

original = original.replaceAll("[^A-Za-z\\s]", "").replaceAll("[0-9]", "");
  • 替换为空字符串而不是空格

  • 允许空格存在(将\ s添加到正则表达式中)

答案 3 :(得分:0)

你应该专注于你想做的事情,而不是从对面偷偷摸摸你的目标。你想计算单词,所以就这样做,而不是替换分裂

一个障碍可能是你对“单词”的特殊定义,但值得花一些时间考虑适当的模式,它会花费更多的时间来考虑多个替换模式加上分裂模式。

忽略长度约束,单词是以字母开头的任何内容(数字和分隔符无论如何都不计入最终任务),后跟任意数量的非空格字符:

String s
    ="Spain is a beautiful country; the beache's are warm, sandy and spotlessly clean.";
int count=0;
for(Matcher m=Pattern.compile("[A-Za-z][^\\s]*").matcher(s); m.find();) {
    System.out.println(count+": "+m.group());
    count++;
}
System.out.println("total number of words: "+count);

将打印:

0: Spain
1: is
2: a
3: beautiful
4: country;
5: the
6: beache's
7: are
8: warm,
9: sandy
10: and
11: spotlessly
12: clean.
total number of words: 13

合并最小长度,不计算非字母字符,可能有点棘手,但可以通过考虑每个字母后跟任意数量的可忽略(即非字母非空格)字符来解决。我们只计算该组合的出现次数。所以

String s
    ="Spain is a beautiful country; the beache's are warm, sandy and spotlessly clean.";
int count=0;
for(Matcher m=Pattern.compile("([A-Za-z][^A-Za-z\\s]*+){4,}").matcher(s); m.find();) {
    System.out.println(count+": "+m.group());
    count++;
}
System.out.println("total number of words >=4 letters: "+count);

打印

0: Spain
1: beautiful
2: country;
3: beache's
4: warm,
5: sandy
6: spotlessly
7: clean.
total number of words >=4 letters: 8

如果您想知道,*+量词就像*,但告诉正则表达式引擎不要在匹配的那部分内进行回溯,这是一个在这种情况下的优化。简单地说,如果可忽略的字符后面没有字母,那么在可忽略字符内也不会有字母,所以引擎不应该花时间在那里找一个。

将其纳入方法形式:

public static int countWords(String original, int minLength) {
    if(minLength<1) throw new IllegalArgumentException();
    int count=0;
    for(Matcher m=Pattern.compile("([A-Za-z][^A-Za-z\\s]*+){"+minLength+",}")
                         .matcher(original); m.find();) {
        count++;
    }
    return count;
}

并使用它

String s
    ="Spain is a beautiful country; the beache's are warm, sandy and spotlessly clean.";
for(int i=1; i<10; i++)
    System.out.println("with at least "+i+" letters: "+countWords(s, i));

产量

with at least 1 letters: 13
with at least 2 letters: 12
with at least 3 letters: 11
with at least 4 letters: 8
with at least 5 letters: 7
with at least 6 letters: 4
with at least 7 letters: 4
with at least 8 letters: 2
with at least 9 letters: 2