Question

我正在尝试一种简单的方法来计算数据框列中不同类别的数量。

例如，在虹膜数据框中，有150行，其中一列是物种，其中有3种不同的物种。我希望能够运行这些代码并确定该列中有3种不同的物种。我不关心每个独特条目对应的行数，只有多少个不同的变量，这主要是我在研究中发现的。

我在想这样的事情：

df <- iris
choices <- count(unique(iris$Species))

这样的解决方案是否存在？我查看了这些帖子，但他们要么检查整个数据框而不是该数据框中的单个列，要么提供比我希望的更复杂的解决方案。

count number of instances in data frame

Count number of occurrences of categorical variables in data frame (R)

How to count number of unique character vectors within a subset of data

Answer 1

以下应该做的工作：

choices <- length(unique(iris$Species))

Answer 2

如果我们使用mm <- DX[,lapply(.SD,mean)] sdd <- DX[,lapply(.SD,sd)]，则dplyr会获得每列中唯一元素的数量

n_distinct

Answer 3

如果您需要计算data.frame每列的唯一实例数，可以使用sapply：

sapply(iris, function(x) length(unique(x)))
#### Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
####  35           23          43            22               3

对于一个特定的列，@ Imran Ali建议的代码（在评论中）完全没问题。

Answer 4

使用data.table更容易：

require(data.table)
uniqueN(iris$Species)

Answer 5

另一种计算“iris”中所有列的唯一值的方法：

> df <- iris

> df$Species <- as.character(df$Species)

> aggregate(values ~ ind, unique(stack(df)), length)
           ind values
1 Petal.Length     43
2  Petal.Width     22
3 Sepal.Length     35
4  Sepal.Width     23
5      Species      3
>

Answer 6

另一种使用 Tidyverse 包进行计数的简单方法：

iris %>% 
  count(Species)

     Species  n
1     setosa 50
2 versicolor 50
3  virginica 50

Answer 7

Dplyr version 1 引入了 across，这使得这个任务与 n_distinct() 一起变得相对简单：

library(dplyr)

# for a specific column
iris %>% 
  summarise(across(Species, n_distinct))
#   Species
# 1       3

# only for factors
iris %>% 
  summarise(across(where(is.factor), nlevels))
#   Species
# 1       3

# for all columns 
iris %>% 
  summarise(across(everything(), n_distinct))
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1           35          23           43          22       3

计算变量的唯一级别数

7 个答案: