从逻辑回归列表中提取系数的所有标准误差

时间:2012-10-24 08:00:15

标签: r logistic-regression coefficients standard-error

我想从逻辑回归模型列表中提取标准错误。

这是逻辑回归函数,以这种方式设计,因此我可以同时运行多个分析:

glmfunk <- function(x) glm(  ldata$DFREE ~  x , family=binomial)

我在数据帧ldata中的变量子集上运行它:

glmkort <- lapply(ldata[,c(2,3,5,6,7,8)],glmfunk)

我可以像这样提取系数:

sapply(glmkørt, "[[", "coefficients")

但是如何提取系数的标准误差?我似乎无法在str(glmkort)中找到它?

这是AGE的str(glmkort),我正在寻找标准错误:

str(glmkort)
List of 6
 $ AGE    :List of 30
  ..$ coefficients     : Named num [1:2] -1.17201 -0.00199
  .. ..- attr(*, "names")= chr [1:2] "(Intercept)" "x"
  ..$ residuals        : Named num [1:40] -1.29 -1.29 -1.29 -1.29 4.39 ...
  .. ..- attr(*, "names")= chr [1:40] "1" "2" "3" "4" ...
  ..$ fitted.values    : Named num [1:40] 0.223 0.225 0.225 0.225 0.228 ...
  .. ..- attr(*, "names")= chr [1:40] "1" "2" "3" "4" ...
  ..$ effects          : Named num [1:40] 3.2662 -0.0282 -0.4595 -0.4464 2.042 ...
  .. ..- attr(*, "names")= chr [1:40] "(Intercept)" "x" "" "" ...
  ..$ R                : num [1:2, 1:2] -2.64 0 -86.01 14.18
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:2] "(Intercept)" "x"
  .. .. ..$ : chr [1:2] "(Intercept)" "x"
  ..$ rank             : int 2
  ..$ qr               :List of 5
  .. ..$ qr   : num [1:40, 1:2] -2.641 0.158 0.158 0.158 0.159 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:40] "1" "2" "3" "4" ...
  .. .. .. ..$ : chr [1:2] "(Intercept)" "x"
  .. ..$ rank : int 2
  .. ..$ qraux: num [1:2] 1.16 1.01
  .. ..$ pivot: int [1:2] 1 2
  .. ..$ tol  : num 1e-11
  .. ..- attr(*, "class")= chr "qr"
  ..$ family           :List of 12
  .. ..$ family    : chr "binomial"
  .. ..$ link      : chr "logit"
  .. ..$ linkfun   :function (mu)  
  .. ..$ linkinv   :function (eta)  
  .. ..$ variance  :function (mu)  
  .. ..$ dev.resids:function (y, mu, wt)  
  .. ..$ aic       :function (y, n, mu, wt, dev)  
  .. ..$ mu.eta    :function (eta)  
  .. ..$ initialize:  expression({     if (NCOL(y) == 1) {         if (is.factor(y))              y <- y != levels(y)[1L]         n <- rep.int(1, nobs)         y[weights == 0] <- 0         if (any(y < 0 | y > 1))              stop("y values must be 0 <= y <= 1")         mustart <- (weights * y + 0.5)/(weights + 1)         m <- weights * y         if (any(abs(m - round(m)) > 0.001))              warning("non-integer #successes in a binomial glm!")     }     else if (NCOL(y) == 2) {         if (any(abs(y - round(y)) > 0.001))              warning("non-integer counts in a binomial glm!")         n <- y[, 1] + y[, 2]         y <- ifelse(n == 0, 0, y[, 1]/n)         weights <- weights * n         mustart <- (n * y + 0.5)/(n + 1)     }     else stop("for the binomial family, y must be a vector of 0 and 1's\n", "or a 2 column matrix where col 1 is no. successes and col 2 is no. failures") })
  .. ..$ validmu   :function (mu)  
  .. ..$ valideta  :function (eta)  
  .. ..$ simulate  :function (object, nsim)  
  .. ..- attr(*, "class")= chr "family"
  ..$ linear.predictors: Named num [1:40] -1.25 -1.24 -1.24 -1.24 -1.22 ...
  .. ..- attr(*, "names")= chr [1:40] "1" "2" "3" "4" ...
  ..$ deviance         : num 42.7
  ..$ aic              : num 46.7
  ..$ null.deviance    : num 42.7
  ..$ iter             : int 4
  ..$ weights          : Named num [1:40] 0.173 0.174 0.174 0.174 0.176 ...
  .. ..- attr(*, "names")= chr [1:40] "1" "2" "3" "4" ...
  ..$ prior.weights    : Named num [1:40] 1 1 1 1 1 1 1 1 1 1 ...
  .. ..- attr(*, "names")= chr [1:40] "1" "2" "3" "4" ...
  ..$ df.residual      : int 38
  ..$ df.null          : int 39
  ..$ y                : Named num [1:40] 0 0 0 0 1 0 1 0 0 0 ...
  .. ..- attr(*, "names")= chr [1:40] "1" "2" "3" "4" ...
  ..$ converged        : logi TRUE
  ..$ boundary         : logi FALSE
  ..$ model            :'data.frame':   40 obs. of  2 variables:
  .. ..$ ldata$DFREE: int [1:40] 0 0 0 0 1 0 1 0 0 0 ...
  .. ..$ x          : int [1:40] 39 33 33 32 24 30 39 27 40 36 ...
  .. ..- attr(*, "terms")=Classes 'terms', 'formula' length 3 ldata$DFREE ~ x
  .. .. .. ..- attr(*, "variables")= language list(ldata$DFREE, x)
  .. .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
  .. .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. .. ..$ : chr [1:2] "ldata$DFREE" "x"
  .. .. .. .. .. ..$ : chr "x"
  .. .. .. ..- attr(*, "term.labels")= chr "x"
  .. .. .. ..- attr(*, "order")= int 1
  .. .. .. ..- attr(*, "intercept")= int 1
  .. .. .. ..- attr(*, "response")= int 1
  .. .. .. ..- attr(*, ".Environment")=<environment: 0x017a5674> 
  .. .. .. ..- attr(*, "predvars")= language list(ldata$DFREE, x)
  .. .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
  .. .. .. .. ..- attr(*, "names")= chr [1:2] "ldata$DFREE" "x"
  ..$ call             : language glm(formula = ldata$DFREE ~ x, family = binomial)
  ..$ formula          :Class 'formula' length 3 ldata$DFREE ~ x
  .. .. ..- attr(*, ".Environment")=<environment: 0x017a5674> 
  ..$ terms            :Classes 'terms', 'formula' length 3 ldata$DFREE ~ x
  .. .. ..- attr(*, "variables")= language list(ldata$DFREE, x)
  .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:2] "ldata$DFREE" "x"
  .. .. .. .. ..$ : chr "x"
  .. .. ..- attr(*, "term.labels")= chr "x"
  .. .. ..- attr(*, "order")= int 1
  .. .. ..- attr(*, "intercept")= int 1
  .. .. ..- attr(*, "response")= int 1
  .. .. ..- attr(*, ".Environment")=<environment: 0x017a5674> 
  .. .. ..- attr(*, "predvars")= language list(ldata$DFREE, x)
  .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
  .. .. .. ..- attr(*, "names")= chr [1:2] "ldata$DFREE" "x"
  ..$ data             :<environment: 0x017a5674> 
  ..$ offset           : NULL
  ..$ control          :List of 3
  .. ..$ epsilon: num 1e-08
  .. ..$ maxit  : num 25
  .. ..$ trace  : logi FALSE
  ..$ method           : chr "glm.fit"
  ..$ contrasts        : NULL
  ..$ xlevels          : Named list()
  ..- attr(*, "class")= chr [1:2] "glm" "lm"
 $ BECK   :List of 30

这是我用于示例的数据。我将其缩短为了这个问题的绝对:

> ldata
   ID AGE   BECK IVHX NDRUGTX RACE TREAT SITE DFREE
1   1  39  9.000    3       1    0     1    0     0
2   2  33 34.000    2       8    0     1    0     0
3   3  33 10.000    3       3    0     1    0     0
4   4  32 20.000    3       1    0     0    0     0
5   5  24  5.000    1       5    1     1    0     1
6   6  30 32.550    3       1    0     1    0     0
7   7  39 19.000    3      34    0     1    0     1
8   8  27 10.000    3       2    0     1    0     0
9   9  40 29.000    3       3    0     1    0     0
10 10  36 25.000    3       7    0     1    0     0
11 11  38 18.900    3       8    0     1    0     0
12 12  29 16.000    1       1    0     1    0     0
13 13  32 36.000    3       2    1     1    0     1
14 14  41 19.000    3       8    0     1    0     0
15 15  31 18.000    3       1    0     1    0     0
16 16  27 12.000    3       3    0     1    0     0
17 17  28 34.000    3       6    0     1    0     0
18 18  28 23.000    2       1    0     1    0     0
19 19  36 26.000    1      15    1     1    0     1
20 20  32 18.900    3       5    0     1    0     1
21 21  33 15.000    1       1    0     0    0     1
22 22  28 25.200    3       8    0     0    0     0
23 23  29  6.632    2       0    0     0    0     0
24 24  35  2.100    3       9    0     0    0     0
25 25  45 26.000    3       6    0     0    0     0
26 26  35 39.789    3       5    0     0    0     0
27 27  24 20.000    1       3    0     0    0     0
28 28  36 16.000    3       7    0     0    0     0
29 29  39 22.000    3       9    0     0    0     1
30 30  36  9.947    2      10    0     0    0     0
31 31  37  9.450    3       1    0     0    0     0
32 32  30 39.000    3       1    0     0    0     0
33 33  44 41.000    3       5    0     0    0     0
34 34  28 31.000    1       6    1     0    0     1
35 35  25 20.000    1       3    1     0    0     0
36 36  30  8.000    3       7    0     1    0     0
37 37  24  9.000    1       1    0     0    0     0
38 38  27 20.000    1       1    0     0    0     0
39 39  30  8.000    1       2    1     0    0     1
40 40  34  8.000    3       0    0     1    0     0

1 个答案:

答案 0 :(得分:7)

使用?glm

中的示例
## Dobson (1990) Page 93: Randomized Controlled Trial :
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
glm.D93 <- glm(counts ~ outcome + treatment, family=poisson())

## copy twice to a list to illustrate
lmod <- list(mod1 = glm.D93, mod2 = glm.D93)

然后我们可以将它们计算为summary(),或者在调用summary()后提取它们。前者效率更高,因为你只计算你想要的东西。后者不依赖于如何得出标准误差。

直接计算标准错误

可以从模型的方差 - 协方差矩阵计算标准误差。该矩阵的对角线包含系数的方差,标准误差只是这些方差的平方根。 vcov()提取器函数为我们获取方差 - 协方差矩阵,我们将对角线与sqrt(diag())平方根:

> lapply(lmod, function(x) sqrt(diag(vcov(x))))
$mod1
(Intercept)    outcome2    outcome3  treatment2  treatment3 
  0.1708987   0.2021708   0.1927423   0.2000000   0.2000000 

$mod2
(Intercept)    outcome2    outcome3  treatment2  treatment3 
  0.1708987   0.2021708   0.1927423   0.2000000   0.2000000

从对summary()

的调用中提取它们

或者我们可以让summary()计算标准错误(以及批次更多),然后使用lapply()sapply()来应用提取的匿名函数coef(summary(x))并取第二列(存储标准错误)。

lapply(lmod, function(x) coef(summary(x))[,2])

哪个给出了

> lapply(lmod, function(x) coef(summary(x))[,2])
$mod1
(Intercept)    outcome2    outcome3  treatment2  treatment3 
  0.1708987   0.2021708   0.1927423   0.2000000   0.2000000 

$mod2
(Intercept)    outcome2    outcome3  treatment2  treatment3 
  0.1708987   0.2021708   0.1927423   0.2000000   0.2000000

sapply()会给出:

> sapply(lmod, function(x) coef(summary(x))[,2])
                 mod1      mod2
(Intercept) 0.1708987 0.1708987
outcome2    0.2021708 0.2021708
outcome3    0.1927423 0.1927423
treatment2  0.2000000 0.2000000
treatment3  0.2000000 0.2000000

根据您的想法,您可以通过一次调用提取系数和标准错误:

> lapply(lmod, function(x) coef(summary(x))[,1:2])
$mod1
                 Estimate Std. Error
(Intercept)  3.044522e+00  0.1708987
outcome2    -4.542553e-01  0.2021708
outcome3    -2.929871e-01  0.1927423
treatment2   1.337909e-15  0.2000000
treatment3   1.421085e-15  0.2000000

$mod2
                 Estimate Std. Error
(Intercept)  3.044522e+00  0.1708987
outcome2    -4.542553e-01  0.2021708
outcome3    -2.929871e-01  0.1927423
treatment2   1.337909e-15  0.2000000
treatment3   1.421085e-15  0.2000000

但你可能会单独选择它们吗?