如何计算R中线性模型的平均值?

时间:2016-10-29 14:16:58

标签: r statistics regression linear-regression lm

我正在处理有关保护及其对生物量的影响的数据集,其中从英格兰北部一万公顷的土地上随机抽取了50块土地,每块土地一公顷。

对于每块土地,记录下列变量:

•生物量:植被生物量的估算值,单位为千克/平方米。

•alt:海拔高度的平均海拔高度。

•缺点:一个分类变量,如果该图是保护区的一部分则编码为1,否则为2。

•土壤分类变量粗略地将土壤类型分类为1为粉笔,2为粘土,3为壤土。

目前我正在努力解决两件事:

如何根据我的拟合模型(模型1)计算粘土(土壤2)和壤土(土壤3)土壤中生物量的平均差异,并计算该平均预测值的95%置信区间。

如何计算位于保护区内的地块的平均预测生物量,该保护区主要是海拔300米的粘土?

这是我正在使用的线性模型的摘要。

Call:
lm(formula = biomass ~ alt + soil + cons, data = conservation)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.183105 -0.052926  0.005593  0.061844  0.194402 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.2928629  0.0357850  64.073  < 2e-16 ***
alt         -0.0029068  0.0001302 -22.318  < 2e-16 ***
soil2       -0.0862220  0.0342955  -2.514   0.0156 *  
soil3       -0.2309939  0.0354480  -6.516 5.33e-08 ***
cons2        0.0488634  0.0292075   1.673   0.1013    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.09428 on 45 degrees of freedom
Multiple R-squared:  0.9459,    Adjusted R-squared:  0.9411 
F-statistic: 196.7 on 4 and 45 DF,  p-value: < 2.2e-16

以下是数据:

dput(conservation)

structure(list(biomass = c(2.01, 2.06, 1.7, 2.07, 1.88, 2.11, 
0.98, 2.14, 1.75, 1.81, 2.15, 1.68, 2.23, 2.04, 1.67, 1.77, 1.74, 
1.53, 1.79, 2.15, 1.39, 2.19, 2.14, 2.29, 1.91, 1.73, 2.21, 1.96, 
2.07, 2.01, 2.2, 2.24, 1.33, 1.05, 1.36, 1.72, 1.44, 1.52, 2.09, 
1.42, 1.64, 0.92, 1.65, 1.37, 0.77, 1.57, 2.25, 2.23, 2.03, 1.18
), alt = c(116L, 21L, 130L, 65L, 117L, 82L, 359L, 5L, 86L, 91L, 
64L, 178L, 79L, 70L, 209L, 110L, 161L, 248L, 146L, 23L, 237L, 
84L, 40L, 7L, 161L, 122L, 25L, 146L, 67L, 118L, 42L, 57L, 277L, 
338L, 331L, 153L, 239L, 237L, 67L, 171L, 206L, 371L, 107L, 236L, 
482L, 240L, 56L, 42L, 68L, 436L), cons = structure(c(2L, 2L, 
1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 
1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L
), .Label = c("1", "2"), class = "factor"), soil = structure(c(2L, 
3L, 2L, 2L, 2L, 1L, 2L, 2L, 3L, 3L, 2L, 1L, 1L, 2L, 2L, 3L, 2L, 
3L, 2L, 2L, 3L, 1L, 3L, 2L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 
3L, 2L, 1L, 3L, 2L, 1L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 1L, 1L, 
2L), .Label = c("1", "2", "3"), class = "factor"), alt.factor = 
structure(c(1L, 
1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 
2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 
2L), .Label = c("below median", "above median"), class = "factor")), 
.Names = c("biomass", 
"alt", "cons", "soil", "alt.factor"), row.names = c(NA, -50L), class = 
"data.frame")

1 个答案:

答案 0 :(得分:2)

  

如何根据我的拟合模型(模型1)计算粘土(土壤2)和壤土(土壤3)土壤中生物量的平均差异,并计算该平均预测值的95%置信区间。

严格地说,这是我们称之为“线性假设检验”的特例。但我认为这不是你的任务的意图,所以我不会采用这种方法。如果您对此感兴趣,请阅读Get p-value for group mean difference without refitting linear model with a new reference level

我将在这里做的是简单地使用不同的因子水平作为对比度并重新设计您的模型。目前,你有&#34; soil1&#34;作为对比度;我将重置&#34; soil2&#34;作为对比度。请查看How to set contrasts for my variable in regression analysis with R?进行一般治疗。

WHERE x = 'value'

现在,&#34; soil3&#34;正在给出#34; soil3&#34;的组平均值的差异。和对比度&#34; soil2&#34;。从标准误差和模型的剩余自由度获得该系数的置信区间非常简单,但同样,这对您来说可能过于技术化。考虑使用fit <- lm(biomass ~ alt + soil + cons, data = conservation, contrasts = list(soil = contr.treatment(n = 3, base = 2))) #Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) 2.2066409 0.0400572 55.087 < 2e-16 *** #alt -0.0029068 0.0001302 -22.318 < 2e-16 *** #soil1 0.0862220 0.0342955 2.514 0.0156 * #soil3 -0.1447719 0.0325295 -4.450 5.59e-05 *** #cons2 0.0488634 0.0292075 1.673 0.1013 #--- #Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # #Residual standard error: 0.09428 on 45 degrees of freedom #Multiple R-squared: 0.9459, Adjusted R-squared: 0.9411 #F-statistic: 196.7 on 4 and 45 DF, p-value: < 2.2e-16

confint
  

如何计算位于保护区内的地块的平均预测生物量,该保护区主要是海拔300米的粘土?

对于回复confint(fit, "soil3", level = 0.95) # 2.5 % 97.5 % #soil3 -0.2102896 -0.0792541 的预测,我们可以使用biomass

predict

因此预测均值约为predict(fit, newdata = list(alt = 300, soil = "2", cons = "1")) # 1 #1.334606