图表。与连续和分类变量的相关性

时间:2017-08-09 09:09:56

标签: r variables categories correlation

我想知道我的变量之间是否存在相关性。这是数据集

的结构
[1] 4 8

正如您所看到的,存在连续和分类变量。 当我运行'data.frame': 189 obs. of 20 variables: $ age : num 24 31 32 35 36 26 31 24 35 36 ... $ diplM2 : Factor w/ 3 levels "0","1","2": 3 2 1 3 2 2 3 2 2 1 ... $ TimeDelcat : Factor w/ 4 levels "0","1","2","3": 1 1 3 3 3 4 2 1 4 4 ... $ SeasonDel : Factor w/ 4 levels "1","2","3","4": 1 2 4 3 4 3 4 3 2 3 ... $ BMIM2 : num 23.4 25.7 17 26.6 24.6 21.6 21 22.3 20.8 20.7 ... $ WgtB2 : int 3740 3615 3705 3485 3420 2775 3365 3770 3075 3000 ... $ sex : Factor w/ 2 levels "1","2": 2 2 1 2 2 2 1 1 1 1 ... $ smoke : Factor w/ 3 levels "0","1","2": 1 1 1 2 1 1 1 1 1 3 ... $ nRBC : num 0.1621 0.0604 0.1935 0.0527 0.1118 ... $ CD4T : num 0.1427 0.2143 0.1432 0.0686 0.0979 ... $ CD8T : num 0.1574 0.1549 0.1243 0.0804 0.0782 ... $ NK : num 0.02817 0 0.04368 0.00641 0.02398 ... $ Bcell : num 0.1033 0.1124 0.1468 0.0551 0.0696 ... $ Mono : num 0.0633 0.0641 0.0773 0.0531 0.0656 ... $ Gran : num 0.428 0.442 0.329 0.716 0.6 ... $ chip : Factor w/ 92 levels "200251580021",..: 12 24 23 2 27 22 6 22 17 22 ... $ pos : Factor w/ 12 levels "R01C01","R01C02",..: 11 12 1 6 9 2 12 1 7 11 ... $ trim1PM25ifdmv4: num 9.45 13.81 15.59 7.13 15.43 ... $ trim2PM25ifdmv4: num 13.27 15.53 10.69 13.56 9.27 ... $ trim3PM25ifdmv4: num 16.72 16.21 12.17 6.47 10.66 ...

我收到此错误:

chart.Correlation(variables, histrogram=T,method = c("pearson") )

我该如何解决这个问题? 谢谢。

1 个答案:

答案 0 :(得分:1)

我相信你只想在数值变量之间进行相关。下面的代码将执行此操作,它将仅输出输入之间的唯一关联。

library(reshape2)  
data <- data.frame(x1=rnorm(10),
            x2=rnorm(10),
            x3=rnorm(10),
            x4=c("a","b","c","d","e","f","g","h","i","j"),
            x5=c("ab","sp","sp","dd","hg","hj","qw","dh","ko","jk"))  

data
       x1         x2         x3     x4 x5
1  -1.2169793  0.5397598  0.4981513  a ab
2  -0.7032631 -2.1262837 -1.0377371  b sp
3   0.8766831 -0.2326975 -0.1219613  c sp
4   0.3405332  2.4766225 -1.1960618  d dd
5   0.1889945  0.3444534  1.9659062  e hg
6   0.8086956  0.4654644 -1.2526696  f hj
7  -0.6850181 -1.7657241  0.5156620  g qw
8   0.8518034  0.9484547  1.4784063  h dh
9   0.5191793  1.2246566  1.3867829  i ko
10  0.4568953 -0.6881464  0.3548839  j jk

#finding correlation for all numerical values  
corr=cor(data[as.numeric(which(sapply(data,class)=="numeric"))])  
#convert the correlation table to long format  
res=melt(corr)  
##keeping only one side of the correlations  
res$type=apply(res,1,function(x) 
paste(sort(c(as.character(x[1]),as.character(x[2]))),collapse="*"))  
res=unique(res[,c("type","value")])  

res
 type      value
x1*x1 1.00000000
x1*x2 0.44024939
x1*x3 0.04936654
x2*x2 1.00000000
x2*x3 0.08859169
x3*x3 1.00000000