Question

我有一个看起来有点像下面的东西：

block  description
1      enroll
1      enroll
1      motivated
1      motivated
1      motivated
2      openemail
2      openemail

我想创建一个新列，其唯一编号对应于＆＃34;描述＆＃34;中的每个唯一值。柱。描述列中的唯一值比此处显示的要多得多。我想知道R是否有办法确定哪些唯一值彼此匹配，然后为每个值生成一个新值，以便产生的结果如下所示：

block  description  question
1      enroll       1
1      enroll       1 
1      motivated    2
1      motivated    2
1      motivated    2
2      openemail    3
2      openemail    3

我打算使用mutate（）创建新列，但不确定输入应该是什么。理想情况下，有一种方法可以做到这一点，而无需输入每个可能在＆＃34;描述＆＃34;下的唯一值。

编辑：以下最适合我的解决方案组合如下：

df$question <- as.integer(factor(df$description, levels = unique(df$description)))

Answer 1

我们可以使用as.integer(factor())来获得所需的结果。使用unique()手动指定级别可避免levels = NULL时的默认排序，因此级别在数据框中的出现顺序。看到我重新排序了一些行，以清楚地表明级别的第一个外观决定了它的索引。

library(tidyverse)
df <- read_table2(
"block  description
 1      enroll
 2      openemail
 1      enroll
 2      openemail
 1      motivated
 1      motivated
 1      motivated"
)
df %>%
  mutate(index = as.integer(factor(description, levels = unique(description))))
#> # A tibble: 7 x 3
#>   block description index
#>   <int> <chr>       <int>
#> 1     1 enroll          1
#> 2     2 openemail       2
#> 3     1 enroll          1
#> 4     2 openemail       2
#> 5     1 motivated       3
#> 6     1 motivated       3
#> 7     1 motivated       3

由reprex package（v0.2.0）创建于2018-04-13。

Answer 2

R基础解决方案

> df$question <- as.numeric(df$description)
> df
  block description question
1     1      enroll        1
2     1      enroll        1
3     1   motivated        2
4     1   motivated        2
5     1   motivated        2
6     2   openemail        3
7     2   openemail        3

在列中标识匹配的字符串，并为R

2 个答案: