相关矩阵

时间:2014-09-01 11:37:12

标签: r correlation

我有一个数据框,在7年的时间内有大约20家公司的每日回报。我想为每个日期的所有公司计算Ri * Rj。

我不确定如何在此处上传我的数据集。我列出了一般格式的一些示例条目。 (数据框使用orderBy函数按日期排序):

Company.Name     Date      Closing.price      Prev.Closing      r        
ABB           2002-08-12       24.16               24.78       0.02
ABAN          2002-08-12       172.5               179.5       0.12
ASHOK         2002-08-12       39.12               36.42       0.15
..
..
ABB           2002-08-13       25.6                24.16       0.21
ABAN          2002-08-13       175.7               172.5       0.02

我尝试了以下功能,但是我收到以下错误:

 Error in FUN(left, right) : non-numeric argument to binary operator 


correlation <- function(x)
{
  y <- matrix(NA, nrow=length(x), ncol=length(x))
  for(i in 1:length(x))
  {
     if(i!=i-1) {j=i}
    for(j in 1:(length(x)-1))  
    {
      y[i,j] <- x[i]*x[j]
    }
  }
}



 correlations <- aggregate(Companies_NSE$r, list(Companies_NSE$Date),
         FUN= correlation(x))

1 个答案:

答案 0 :(得分:2)

你可以尝试:

 library(matrixStats)
 with(df, aggregate(r, list(Date=Date), FUN= function(x) colProds(combn(x,2))))
 #        Date                      x
 #1 2002-08-12 0.0024, 0.0030, 0.0180
 #2 2002-08-13                 0.0042

或使用data.table

 library(data.table)
 setDT(df)[, list(Prod=colProds(combn(r,2))), by=Date]
 #         Date   Prod
 #1: 2002-08-12 0.0024
 #2: 2002-08-12 0.0030
 #3: 2002-08-12 0.0180
 #4: 2002-08-13 0.0042

数据

df <- structure(list(Company.Name = c("ABB", "ABAN", "ASHOK", "ABB", 
"ABAN"), Date = c("2002-08-12", "2002-08-12", "2002-08-12", "2002-08-13", 
"2002-08-13"), Closing.price = c(24.16, 172.5, 39.12, 25.6, 175.7
), Prev.Closing = c(24.78, 179.5, 36.42, 24.16, 172.5), r = c(0.02, 
0.12, 0.15, 0.21, 0.02)), .Names = c("Company.Name", "Date", 
"Closing.price", "Prev.Closing", "r"), class = "data.frame", row.names = c(NA, 
-5L))