Question

我有一个数据重塑问题，我可以使用一些帮助。

 ID          X1         X2         X3         X4         X5
6001 Certificate  Associate Bachelor's   Master's   Doctoral
5001 Certificate  Associate Bachelor's           
3311 Certificate  Associate Bachelor's           
1981 Certificate  Associate Bachelor's   Master's
4001   Associate Bachelor's   Master's           
2003   Associate Bachelor's   Master's   Doctoral
2017 Certificate  Associate                      
1001   Associate Bachelor's   Master's           
5002  Bachelor's

我需要将这些变成虚拟变量

  ID    Certificate     Associates      Bachelor         Master        Doctoral      
6001              1              1             1              1               1
5001              1              1             1              0               0 
2017              1              1             0              0               0

有什么建议吗？

Answer 1

试用reshape2套餐。我假设您的数据集名为df：

require(reshape2)
# First, melt your data, using 
m.df = melt(df, id.vars="ID")
# Then `cast` it
dcast(m.df, ID ~ value, length)
#     ID Var.2 Associate Bachelor's Certificate Doctoral Master's
# 1 1001     2         1          1           0        0        1
# 2 1981     1         1          1           1        0        1
# 3 2003     1         1          1           0        1        1
# 4 2017     3         1          0           1        0        0
# 5 3311     2         1          1           1        0        0
# 6 4001     2         1          1           0        0        1
# 7 5001     2         1          1           1        0        0
# 8 5002     4         0          1           0        0        0
# 9 6001     0         1          1           1        1        1

我还没有对它进行测试，但是如果你按顺序排列因子，它可能会控制输出列的顺序。

在r中列出指标变量

1 个答案: