用R分析非线性数据

时间:2014-11-16 06:06:55

标签: r regression nonlinear-functions

我有以下数据,其中xx和yy之间似乎存在曲线关系:

head(ddf)
  xx yy
1  1 10
2  2  9
3  3 11
4  4  9
5  5  7
6  6  6

ddf = structure(list(xx = 1:23, yy = c(10L, 9L, 11L, 9L, 7L, 6L, 9L, 
8L, 5L, 4L, 6L, 6L, 5L, 4L, 6L, 8L, 4L, 6L, 8L, 11L, 8L, 10L, 
9L)), .Names = c("xx", "yy"), class = "data.frame", row.names = c(NA, 
-23L))

with(ddf, plot(xx,yy))

enter image description here

我想分析一下并得到以下内容:

  1. 找出xx和yy
  2. 之间的非线性关系
  3. 得到它的等式
  4. 获取其P值
  5. 如果可能,得到R(相关系数)(非线性)
  6. 绘制此曲线
  7. 我知道nls,它给了我一个等式,但我必须输入一个公式,这可能不正确。此外,我无法得到曲线和R和P值。

    > nls(yy~a*(xx^b), data=ddf)
    Nonlinear regression model
      model: yy ~ a * (xx^b)
       data: ddf
          a       b 
     9.5337 -0.1184 
     residual sum-of-squares: 95.85
    
    Number of iterations to convergence: 8 
    Achieved convergence tolerance: 3.407e-06
    Warning message:
    In nls(yy ~ a * (xx^b), data = ddf) :
      No starting values specified for some parameters.
    Initializing ‘a’, ‘b’ to '1.'.
    Consider specifying 'start' or using a selfStart model
    

    我也知道ggplot的stat_smooth可以绘制曲线。但这也没有给我公式,R和P值。

1 个答案:

答案 0 :(得分:5)

您可以预测新xx值范围内的值并绘制它们。关于你想要的结果:

# 1. Find the nonlinear relation between xx and yy
fit <- nls(yy ~ a*xx^b, data=ddf)
# 2. Get its equation
coef(fit)
# 3. Get its P value
summary(fit)
# 4. If possible get R (correlation coefficient) (nonlinear)
cor(predict(fit), ddf$yy)
# 5. Plot this curve
newdat <- data.frame(xx=seq(min(ddf$xx), max(ddf$xx),,100))
newdat$yy <- predict(fit, newdat)
plot(yy ~ xx, ddf)
lines(yy ~ xx, newdat, col=2)

enter image description here

这是使用多项式的另一个选项:

# 1. Find the nonlinear relation between xx and yy
fit <- lm(yy ~ poly(xx, n=2, raw=TRUE), data=ddf)
# 2. Get its equation
coef(fit)
# 3. Get its P value
summary(fit)
# 4. If possible get R (correlation coefficient) (nonlinear)
cor(predict(fit), ddf$yy)
# 5. Plot this curve
newdat <- data.frame(xx=seq(min(ddf$xx), max(ddf$xx),,100))
newdat$yy <- predict(fit, newdat)
plot(yy ~ xx, ddf)
lines(yy ~ xx, newdat, col=2)

enter image description here

最后,GAM版本:

# 1. Find the nonlinear relation between xx and yy
library(mgcv)
fit <- gam(yy ~ s(xx), data=ddf)
# 2. Get its equation
coef(fit)
# 3. Get its P value
summary(fit)
# 4. If possible get R (correlation coefficient) (nonlinear)
cor(predict(fit), ddf$yy)
# 5. Plot this curve
newdat <- data.frame(xx=seq(min(ddf$xx), max(ddf$xx),,100))
newdat$yy <- predict(fit, newdat)
plot(yy ~ xx, ddf)
lines(yy ~ xx, newdat, col=2)

enter image description here

您可以从GAM模型的系数中看出,这是一个更大的模型,更难以在公式中表示。但是,你的形式有很大的灵活性,如果这是最好的关系,它应该减少(即通过较少数量的&#34;结和#34;)到线性模型。