如何按组计算相关性

时间:2018-08-07 16:29:43

标签: r

我正在尝试运行for循环来计算因子变量水平的相关性。我的数据集中的32个团队中的每个团队都有16行数据。我想将年份与每个团队的积分相关联。我可以一个接一个地做,但是想在循环中变得更好。

correlate <- data %>%
  select(Team, Year, Points_Game) %>% 
  filter(Team == "ARI") %>% 
  select(Year, Points_Game)

cor(correlate)

我通过以下方式使对象成为“团队”:

teams <- levels(data$Team)

使用[i]迭代所有32个团队以获取每个团队的年份和得分之间的关​​联会有所帮助!

2 个答案:

答案 0 :(得分:1)

require(dplyr)

# dummy data
data = data.frame(
  Team = sapply(1:32, function(x) paste0("T", x)),
  Year = rep(c(2000:2009), 32),
  Points_Game = rnorm(320, 100, 10)
)

# find correlation of Year and Points_Game for each team
# r - correlation coefficient
correlate <- data %>%
                group_by(Team) %>% 
                summarise(r = cor(Year, Points_Game))

答案 1 :(得分:0)

data.table方式:

library(data.table)

# dummy data (same as @Aleksandr's)
dat <- data.table(
  Team = sapply(1:32, function(x) paste0("T", x)),
  Year = rep(c(2000:2009), 32),
  Points_Game = rnorm(320, 100, 10)
)

# find correlation of Year and Points_Game for each Team
result <- dat[ , .(r = cor(Year, Points_Game)), by = Team]
相关问题