如何使用scipy.odr估计拟合优度?

时间:2014-01-28 01:45:21

标签: scipy regression orthogonal

我使用scipy.odr使用权重拟合数据,但我不知道如何获得拟合优度或R平方的度量。有没有人建议如何使用函数存储的输出获得此度量?

3 个答案:

答案 0 :(得分:6)

Outputres_var属性是拟合的所谓降低卡方值,是适合度统计量的流行选择。但是,It is somewhat problematic用于非线性拟合。您可以直接查看残差(out.delta残差为Xout.eps残差为Y。如链接文件中所建议的那样,实施交叉验证或引导方法来确定拟合度,留给读者练习。

答案 1 :(得分:2)

这应该对您有用:

import numpy as np
from scipy import stats, odr


def linear_func(B, x):
    """
    From https://docs.scipy.org/doc/scipy/reference/odr.html
    Linear function y = m*x + b
    """
    # B is a vector of the parameters.
    # x is an array of the current x values.
    # x is in the same format as the x passed to Data or RealData.
    #
    # Return an array in the same format as y passed to Data or RealData.
    return B[0] * x + B[1]


np.random.seed(0)
sigma_x = .1
sigma_y = .15
N = 100
x_star = np.linspace(0, 10, N)
x = np.random.normal(x_star, sigma_x, N)
# the true underlying function is y = 2*x_star + 1
y = np.random.normal(2*x_star + 1, sigma_y, N)

linear = odr.Model(linear_func)
dat = odr.Data(x, y, wd=1./sigma_x**2, we=1./sigma_y**2)
this_odr = odr.ODR(dat, linear, beta0=[1., 0.])
odr_out = this_odr.run()
# degrees of freedom are n_samples - n_parameters
df = N - 2  # equivalently, df = odr_out.iwork[10]
t_stat = odr_out.beta[0] / odr_out.sd_beta[0]  # t statistic for the slope parameter
p_val = stats.t.sf(np.abs(t_stat), df) * 2
print('Recovered equation: y={:3.2f}x + {:3.2f}, t={:3.2f}, p={:.2e}'.format(odr_out.beta[0], odr_out.beta[1], t_stat, p_val))

Recovered equation: y=2.00x + 1.01, t=239.63, p=1.76e-137

答案 2 :(得分:0)

正如 R. Ken 所提到的,残差的卡方或方差是其中之一 常用的拟合优度检验。 ODR 存储平方和 out.sum_square 中的残差,您可以自己验证 out.res_var = out.sum_square/degrees_freedom 对应于通常所说的缩减卡方:即卡方测试结果除以其期望值。

至于线性回归中另一个非常流行的拟合优度估计器 R 平方及其调整版本,我们可以定义函数

import numpy as np

def R_squared(observed, predicted, uncertainty=1):
    """ Returns R square measure of goodness of fit for predicted model. """
    weight = 1./uncertainty
    return 1. - (np.var((observed - predicted)*weight) / np.var(observed*weight))

def adjusted_R(x, y, model, popt, unc=1):
    """
    Returns adjusted R squared test for optimal parameters popt calculated
    according to W-MN formula, other forms have different coefficients:
    Wherry/McNemar : (n - 1)/(n - p - 1)
    Wherry : (n - 1)/(n - p)
    Lord : (n + p - 1)/(n - p - 1)
    Stein : (n - 1)/(n - p - 1) * (n - 2)/(n - p - 2) * (n + 1)/n

    """
    # Assuming you have a model with ODR argument order f(beta, x)
    # otherwise if model is of the form f(x, a, b, c..) you could use
    # R = R_squared(y, model(x, *popt), uncertainty=unc)
    R = R_squared(y, model(popt, x), uncertainty=unc)
    n, p = len(y), len(popt)
    coefficient = (n - 1)/(n - p - 1)
    adj = 1 - (1 - R) * coefficient
    return adj, R

从 ODR 运行的输出中,您可以在 out.beta 中找到模型参数的最佳值,此时我们拥有计算 R 平方所需的一切。

from scipy import odr

def lin_model(beta, x):
    """
    Linear function y = m*x + q
    slope m, constant term/y-intercept q
    """
    return beta[0] * x + beta[1]

linear = odr.Model(lin_model)
data = odr.RealData(x, y, sx=sigma_x, sy=sigma_y)
init = odr.ODR(data, linear, beta0=[1, 1])
out = init.run()

adjusted_Rsq, Rsq = adjusted_R(x, y, lin_model, popt=out.beta)