Question

对于这个问题，我试图检查名为firstdigits的数据集数据中的列（它是第22列），确定每个值出现的次数，并将其放入名为count（第27列）的新列中。所以说数据$ firstdigits总共出现1次，数据$ firstdigits = 1，我希望数据$ count = 5。

我提出的方法可能有用，但是它很笨重，还没有完成运行但我还不知道。我正在寻找一种更快的方法来实现这一目标。

unique = as.data.frame(unique(data$firstdigits))
count = as.data.frame(0)
for (i in 1:nrow(unique)){
  count[i,1] = sum(data$firstdigits == unique[i,1])
}

data$count = 0
for(j in 1:nrow(data)){
  for(k in 1:nrow(unique)){
    if (data[j,22] == unique[k,1]){
      data[j,27] == count[k,1]
    }
  }
}

Answer 1

也许你可以完全放弃嵌套循环：

使用循环，您可以浏览data$firstdigits中的所有唯一值，然后在data$count中指定出现次数：

## create count column if necessary
# data$count <- 0

for (v in unique(data$firstdigits)){

# number of occurences x
x <- sum(data$firstdigits == v)

data$count[data$firstdigits == v] <- x

}

Answer 2

sqldf

怎么样？

Including: cache/upgrades/temp/6wux90/scripts/pre_install.php
17% success....

Answer 3

根据评论者的建议，您可以使用table和dplyr。我将构成一个数据框

df <- data.frame( firstdigits <- round(runif(100)*10) )
df

  firstdigits
1           1
2           7
3           1
4           2
5           1
6           0

使用table计算唯一值

tbl.df <- table( df$firstdigits )
tbl.df

0  1  2  3  4  5  6  7  8  9 10 
9 10 11  9 15  7  7 12  6  7  7

然后使用dplyr::mutate将计数绑定为新列

df <- df %>% 
      mutate( count = tbl.df[as.character(firstdigits)] )

注意我正在使用字符值来索引tbl.df。 tbl.df[0]不是有效索引，而tbl.df["0"]则为9。

优化R中嵌套的循环

3 个答案: