将观察结果转化为变量

时间:2015-10-17 20:03:33

标签: r

我有以下格式的数据:

|id|genre1|genre2 |genre3 |
|1 |action|comedy |romance|
|2 |comedy|romance|       |
|3 |romance|      |       |

我想将我的数据转换为以下格式:

|id|action|comedy|romance|
|1 |1     |1     |1      |
|2 |0     |1     |1      |
|3 |0     |0     |1      |

这样做的最佳方式是什么?

3 个答案:

答案 0 :(得分:2)

假设空元素是空字符串(即它们不包含空格),您可以先用NA替换这些元素,然后使用 reshape2 包重新整形数据。 / p>

is.na(df) <- df == ""

library(reshape2)
dcast(melt(df, 1, na.rm = TRUE), id ~ value, length)
#   id action comedy romance
# 1  1      1      1       1
# 2  2      0      1       1
# 3  3      0      0       1

或者是一个有趣的单行,保持原始数据不变。

dcast(melt(replace(df, df == "", NA), 1, na.rm = TRUE), id ~ value, length)
#   id action comedy romance
# 1  1      1      1       1
# 2  2      0      1       1
# 3  3      0      0       1

使用的原始数据:

df <- structure(list(id = 1:3, genre1 = c("action", "comedy", "romance"
), genre2 = c("comedy", "romance", ""), genre3 = c("romance", 
"", "")), .Names = c("id", "genre1", "genre2", "genre3"), class = "data.frame", row.names = c(NA, 
-3L))

答案 1 :(得分:1)

您可以使用重塑。

library(dplyr)
library(tidyr)

df %>%
  gather(number, genre, genre1:genre3) %>%
  filter(genre != "") %>%
  select(-number) %>%
  mutate(one = 1) %>%
  spread(genre, one, fill = 0)

答案 2 :(得分:1)

使用基数R,您可以使用reshapetable

mydf <-data.frame(id=1:3,
genre1=c("action","comedy","romance"),
genre2=c("comedy","romance",NA),
genre3=c("romance",NA,NA))

colnames(mydf)[2:4] <- paste0("genre.",colnames(mydf)[2:4])
m_data <- reshape(mydf,direction="long", varying=2:4)
with(m_data, table(id, genre))

   genre
id  action comedy romance
  1      1      1       1
  2      0      1       1
  3      0      0       1