重新格式化数据

时间:2017-05-30 13:47:07

标签: r tidyr

我开始使用tidyr和dplyr。我有以下数据框:

                            email Assignment   Stage  Grade
1                     foo1@bar.com    course   final  86.28
2                     foo2@bar.com    course   first  68.87
3                     foo3@bar.com    course   resub  38.06
4                     foo3@bar.com    course   final  77.41
...

我想重新构建这个,以便根据Stage的值(首先,resub或final),我从一个Grade列中创建三个对应于Stage

值的列。
                            email Assignment   first  resub  final
1                     foo1@bar.com    course   100.0  100.0  100.0
2                     foo2@bar.com    course   100.0  100.0  100.0
3                     foo3@bar.com    course   100.0  100.0  100.0
4                     foo3@bar.com    course   100.0  100.0  100.0

(由于剪切/粘贴,数据显然不匹配。)

我很困惑,我需要一个单独的()函数,但是如何?

1 个答案:

答案 0 :(得分:1)

来自tidyr的spread()函数可以为您提供所需的结果。

email <- c("foo1@bar.com","foo2@bar.com","foo3@bar.com","foo3@bar.com")
Assignment <- rep("course",4)
Stage <- c("final","first","resub","final")
Grade <- c(86.28,68.87,38.06,77.41)

df <- data.frame(email,Assignment,Stage,Grade,stringsAsFactors = FALSE)

df <- df %>% 
      spread(Stage, Grade)