Question

我使用geepack为R来估算geeglm()的逻辑边际模型。但我得到垃圾估计。它们大约16个数量级太大。然而，p值似乎与我的预期相似。这意味着响应基本上成为阶梯函数。见附图

以下是生成图表的代码：

require(geepack)
data = read.csv(url("http://folk.uio.no/mariujon/data.csv"))
fit = geeglm(moden ~ 1 + power, id = defacto, data=data, corstr = "exchangeable", family=binomial)
summary(fit)
plot(moden ~ power, data=data)
x = 0:2500
y = predict(fit, newdata=data.frame(power = x), type="response" )
lines(x,y)

这是回归表：

Call:
geeglm(formula = moden ~ 1 + power, family = binomial, data = data, 
    id = defacto, corstr = "exchangeable")

 Coefficients:
             Estimate   Std.err  Wald Pr(>|W|)    
(Intercept) -7.38e+15  1.47e+15  25.1  5.4e-07 ***
power        2.05e+13  1.60e+12 164.4  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Estimated Scale Parameters:
            Estimate  Std.err
(Intercept) 1.03e+15 1.65e+37

Correlation: Structure = exchangeable  Link = identity 

Estimated Correlation Parameters:
      Estimate  Std.err
alpha    0.196 3.15e+21
Number of clusters:   3   Maximum cluster size: 381

希望得到一些帮助。谢谢！

亲切的问候，

的Marius

Answer 1

我将给出三个程序，每个程序都是边缘化的随机拦截模型（MRIM）。这些MRIM具有边际逻辑解释的系数，并且其幅度小于GEE：

| Model | (Intercept) |  power |  LogL  |
|-------|-------------|--------|--------| 
| `L_N` |       -1.050| 0.00267|  -270.1|
| `LLB` |       -0.668| 0.00343|  -273.8|
| `LPN` |       -1.178| 0.00569|  -266.4|

与没有考虑任何相关性的glm进行比较，供参考：

| Model | (Intercept) |  power |  LogL  |
|-------|-------------|--------|--------| 
|  strt |       -0.207| 0.00216|  -317.1|

边缘化随机拦截模型（MRIM）值得探索，因为您需要具有可交换相关结构的边缘模型用于聚类数据，这是MRIM所展示的结构类型。

R script with comments中的代码（尤其是GITHUB repo）和文献的PDF文件。我详细介绍了下面的代码和文献。

MRIM的概念自1999年以来一直存在，关于此的一些背景知识在GITHUB repo。我建议首先阅读 Swihart et al 2014 ，因为它会审查其他论文。

按时间顺序 -

L_N Heagerty（1999）：该方法适用于具有正态分布随机截距的随机拦截逻辑模型。诀窍在于随机截距模型中的预测器使用边际系数进行非线性参数化，以便得到的边际模型具有边际逻辑解释。它的代码是lnMLE R包（不在CRAN上，但在Patrick Heagerty的网站上here）。这种方法在代码中用L_N表示，表示边际上的logit（L），条件尺度上没有解释（_）和正常（N）分布式随机截距。
LLB Wang＆amp;路易斯（2003）：该方法适用于具有桥分布式随机拦截的随机拦截逻辑模型。与Heagerty 1999不同，其中技巧是随机拦截模型的非线性预测器，技巧是特殊的随机效应分布（桥分布），允许随机截距模型和结果边际模型具有逻辑解释。其代码使用gnlmix4MMM.R（在repo中）实现，该rmutil使用repeated和LLB R包。这种方法在代码中用LPN表示，表示边际上的logit（L），条件尺度上的logit（L）和桥（B）分布式拦截。
gnlmix4MMM.R Caffo和Griswold（2006）：该方法适用于具有正态分布随机截距的随机拦截 probit 模型，而Heagerty 1999使用 logit 随机拦截模型。这种替换使得计算更容易，并且仍然产生边际logit模型。其代码使用rmutil（在repo中）实现，该repeated使用LPN和gnlmix4MMM.R R包。这种方法在代码中用rmutil表示，表示边际上的logit（L），条件尺度上的概率（P）和正常（N）分布的截距。
Griswold et al (2013)：另一篇评论/实用介绍。
Swihart et al 2014 ：这是Heagerty 1999和Wang＆amp; Sons的评论文章。路易斯2003以及其他人并推广了MRIM方法。最有趣的概括之一是允许边际和条件模型中的逻辑CDF（等效地，logit链接）代替近似于逻辑CDF的稳定分布。其代码使用repeated（在repo中）实现，该SSS使用#code from OP Question: edit `data` to `d` require(geepack) d = read.csv(url("http://folk.uio.no/mariujon/data.csv")) fit = geeglm(moden ~ 1 + power, id = defacto, data=d, corstr = "exchangeable", family=binomial) summary(fit) plot(moden ~ power, data=d) x = 0:2500 y = predict(fit, newdata=data.frame(power = x), type="response" ) lines(x,y) #get some starting values from glm(): strt <- coef(glm(moden ~ power, family = binomial, data=d)) strt #I'm so sorry but these methods use attach() attach(d)和L_N R包。我在R script with comments中表示# marginally specifies a logit link and has a nonlinear conditional model # the following code will not run if lnMLE is not successfully installed. # See https://faculty.washington.edu/heagerty/Software/LDA/MLV/ library(lnMLE) L_N <- logit.normal.mle(meanmodel = moden ~ power, logSigma= ~1, id=defacto, model="marginal", data=d, beta=strt, r=10) print.logit.normal.mle(L_N)表示边际上的稳定（S），条件尺度上的稳定（S）和稳定（S）分布的截距。它包含在R脚本中，但在本文中没有详细说明。

制备

LLB

`LPN` Heagerty（1999）

library("gnlm")
library("repeated")
source("gnlmix4MMM.R") ## see ?gnlmix; in GITHUB repo 
y <- cbind(d$moden,(1-d$moden))

准备`LLB`和`LLB <- gnlmix4MMM(y = y, distribution = "binomial", mixture = "normal", random = "rand", nest = defacto, mu = ~ 1/(1+exp(-(a0 + a1power)sqrt(1+3/pi/piexp(pmix)) - sqrt(1+3/pi/piexp(pmix))log(sin(pipnorm(rand/sqrt(exp(pmix)))/sqrt(1+3/pi/piexp(pmix)))/sin(pi(1-pnorm(rand/sqrt(exp(pmix))))/sqrt(1+3/pi/pi*exp(pmix)))))), pmu = c(strt, log(1)), pmix = log(1)) print("code: 1 -best 2-ok 3,4,5 - problem") LLB$code print("coefficients") LLB$coeff print("se") LLB$se`

LPN

`LPN <- gnlmix4MMM(y = y, distribution = "binomial", mixture = "normal", random = "rand", nest = defacto, mu = ~pnorm(qnorm(1/(1+exp(-a0 - a1power)))sqrt(1+exp(pmix)) + rand), pmu = c(strt, log(1)), pmix = log(1)) print("code: 1 -best 2-ok 3,4,5 - problem") LPN$code print("coefficients") LPN$coeff print("se") LPN$se` Wang和Louis（2003）

rbind("L_N"=L_N$beta, "LLB" = LLB$coefficients[1:2], "LPN"=LPN$coefficients[1:2])

`rbind("L_N"=L_N$logL, "LLB" = -LLB$maxlike, "LPN"=-LPN$maxlike)` Caffo和Griswold（2006）

Dim s As String = Regex.Unescape("c:\\folder1\\file1.txt\nc:\\folder1\\file2.txt\nc:\\folder1\\file3.txt")

来自3种方法的

系数：

{{1}}

3个模型的最大对数似然性：

{{1}}

R geepack：使用GEE进行不合理的大规模估算

1 个答案:

制备

`LPN` Heagerty（1999）

`rbind("L_N"=L_N$logL, "LLB" = -LLB$maxlike, "LPN"=-LPN$maxlike)` Caffo和Griswold（2006）

系数：

3个模型的最大对数似然性：

R geepack：使用GEE进行不合理的大规模估算

1 个答案:

制备

LPN Heagerty（1999）

rbind("L_N"=L_N$logL, "LLB" = -LLB$maxlike, "LPN"=-LPN$maxlike) Caffo和Griswold（2006）

系数：

3个模型的最大对数似然性：

`LPN` Heagerty（1999）

`rbind("L_N"=L_N$logL, "LLB" = -LLB$maxlike, "LPN"=-LPN$maxlike)` Caffo和Griswold（2006）