计算单词量的字母的具体长度

时间:2017-12-06 12:19:37

标签: r

我试着在一个文本句子中找到超过4个字母的单词 我试过这个:

void scale(Segment& segment, SegmentEnd end, const double& scaleVal)
{
    Point& p(segment.*end);
    p._x = scaleVal*p._x;
    p._y = scaleVal*p._y;
}

我希望将结果作为结果,例如在前面的例子中,句子/字符串有2个单词,长度大于4个字母,第二个单词有2个单词。

使用nchar我从字符串中获取完整长度的字符。

制作它的正确方法是什么?

1 个答案:

答案 0 :(得分:1)

library(dplyr)
library(purrr)

# vector of sentences
fullsetence <- as.character(c("A test setence with test length","A second test for length"))

# get vector of counts for words with more than 4 letters
fullsetence %>%
  strsplit(" ") %>%
  map(~sum(nchar(.) > 4)) %>%
  unlist()

# [1] 2 2


# create a dataframe with sentence and the corresponding counts
# use previous code as a function within "mutate" 
data.frame(fullsetence, stringsAsFactors = F) %>%
  mutate(Counts = fullsetence %>%
                   strsplit(" ") %>%
                   map(~sum(nchar(.) > 4)) %>%
                   unlist() )

#                       fullsetence Counts
# 1 A test setence with test length      2
# 2        A second test for length      2

如果你想获得超过4个字母的实际单词,你可以用类似的方式使用它:

fullsetence %>%
  strsplit(" ") %>%
  map(~ .[nchar(.) > 4])

data.frame(fullsetence, stringsAsFactors = F) %>%
  mutate(Words = fullsetence %>%
                 strsplit(" ") %>%
                 map(~ .[nchar(.) > 4]))