Question

所以我有这样的数据 -

##      V2   V3     V4   V5   V6   V7    V8
## 2  27.0 41.3 2948.0 26.2 51.7 42.7  89.8
## 3  22.9 66.7 4644.0  3.0 45.7 41.8 121.3
## 4  26.3 58.1 3665.0  3.0 50.8 38.5 115.2
## 5  29.1 39.9 2878.0 18.3 51.5 38.8 100.3
## 6  28.1 62.6 4493.0  7.0 50.8 39.7 123.0
## 7  26.2 63.9 3855.0  3.0 50.7 31.1 124.8

我想做一个多元线性回归 -

model1 = lm(cigarette.data$V8 ~ cigarette.data$V2 + cigarette.data$V3 + cigarette.data$V4 + cigarette.data$V5 + cigarette.data$V6 + cigarette.data$V7, data = cigarette.data)

但是这给了我 -

    ## 
## Call:
## lm(formula = cigarette.data$V8 ~ cigarette.data$V2 + cigarette.data$V3 + 
##     cigarette.data$V4 + cigarette.data$V5 + cigarette.data$V6 + 
##     cigarette.data$V7, data = cigarette.data)
## 
## Residuals:
## ALL 51 residuals are 0: no residual degrees of freedom!
## 
## Coefficients: (186 not defined because of singularities)
##                         Estimate Std. Error t value Pr(>|t|)
## (Intercept)                   19         NA      NA       NA
## cigarette.data$V223.1         20         NA      NA       NA
## cigarette.data$V223.9         23         NA      NA       NA
## cigarette.data$V224.8        -16         NA      NA       NA
## cigarette.data$V225.0         21         NA      NA       NA
## cigarette.data$V225.1         25         NA      NA       NA
## cigarette.data$V225.9         -9         NA      NA       NA
## cigarette.data$V226.2          8         NA      NA       NA

这似乎不对。这是怎么回事？

Answer 1

问题在于您拟合的模型具有比样本（即行）更多的预测变量。您的示例包含6个样本，因此5个变量（+ intercept = 6）将完美地预测V8预测：

cigarette.data <- structure(list(V2 = c(27, 22.9, 26.3, 29.1, 28.1, 26.2), V3 = c(41.3, 
66.7, 58.1, 39.9, 62.6, 63.9), V4 = c(2948, 4644, 3665, 2878, 
4493, 3855), V5 = c(26.2, 3, 3, 18.3, 7, 3), V6 = c(51.7, 45.7, 
50.8, 51.5, 50.8, 50.7), V7 = c(42.7, 41.8, 38.5, 38.8, 39.7, 
31.1), V8 = c(89.784450178314, 121.359442280557, 115.031032135658, 
100.201279353697, 123.401631728502, 124.750887806)), .Names = c("V2", 
"V3", "V4", "V5", "V6", "V7", "V8"), row.names = c(NA, -6L), class = "data.frame")

fit <- lm(V8 ~ V2 + V3 + V4 + V5 + V6 + V7, data = cigarette.data)
summary(fit)


Call:
lm(formula = V8 ~ V2 + V3 + V4 + V5 + V6 + V7, data = cigarette.data)

Residuals:
ALL 6 residuals are 0: no residual degrees of freedom!

Coefficients: (1 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 98.89203         NA      NA       NA
V2           5.66196         NA      NA       NA
V3           2.16574         NA      NA       NA
V4          -0.01412         NA      NA       NA
V5           0.03093         NA      NA       NA
V6          -4.07376         NA      NA       NA
V7                NA         NA      NA       NA

Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:    NaN 
F-statistic:   NaN on 5 and 0 DF,  p-value: NA

您的模型应包含更少的变量或更多样本（请参阅下面的示例）：

fit <- lm(V8 ~ V2 + V3 + V4 + V5, data = cigarette.data)
summary(fit)

Call:
lm(formula = V8 ~ V2 + V3 + V4 + V5, data = cigarette.data)

Residuals:
      1       2       3       4       5       6 
-1.1873  0.9570 -2.9738  1.9870 -0.7142  1.9312 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.846025  57.297709   0.311    0.808
V2           1.848628   1.240164   1.491    0.376
V3           0.802375   0.879204   0.913    0.529
V4           0.001821   0.008315   0.219    0.863
V5          -0.583697   0.601185  -0.971    0.509

Residual standard error: 4.4 on 1 degrees of freedom
Multiple R-squared:  0.981, Adjusted R-squared:  0.9052 
F-statistic: 12.94 on 4 and 1 DF,  p-value: 0.2052

Answer 2

数据框中的记录之一必须为null值或0.0。在拟合模型之前，请尝试估算这些记录或将其从数据框中删除。

在R中拟合多元线性回归

2 个答案: