多项式回归将不起作用

时间:2018-08-16 10:26:11

标签: python regression

我有一个二元多项式回归问题。
我从互联网上得到了这个示例,但似乎无法使其用于我的数据。如图所示,所作的预测令人恐惧,我无法弄清原因。我尝试了相同的代码,但提供了更多的培训数据,但这给出了更差的预测。 我假设该模型会自动显示比例。所有的自变量都很重要,所以我无法想象它需要消除。

可怜的谓词:

poor predicitons

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from sklearn.linear_model import LinearRegression
# Fitting Ploynomial regression to the dataset
from sklearn.preprocessing import PolynomialFeatures

#X is the independent variable (bivariate in this case)

X = np.array([[1.00000000e-01, 7.56500000e+00],
 [1.00000000e-01, 8.11100000e+00],
 [1.00000000e-01,8.69700000e+00],
 [1.00000000e-01,9.32600000e+00],
 [1.00000000e-01,1.00000000e+01],
 [1.00000000e-01,1.07200000e+01],
 [1.00000000e-01,1.15000000e+01],
 [1.00000000e-01,1.23300000e+01],
 [1.00000000e-01,1.32200000e+01],
 [1.00000000e-01,1.41700000e+01],
 [1.00000000e-01,1.52000000e+01],
 [1.00000000e-01,1.63000000e+01],
 [1.00000000e-01,1.74800000e+01],
 [1.00000000e-01,1.87400000e+01],
 [1.00000000e-01,2.00900000e+01],
 [1.00000000e-01,2.15400000e+01],
 [1.00000000e-01,2.31000000e+01],
 [1.00000000e-01,2.47700000e+01],
 [1.00000000e-01,2.65600000e+01],
 [1.00000000e-01,2.84800000e+01],
 [1.00000000e-01,3.05400000e+01],
 [1.00000000e-01,3.27500000e+01],
 [1.00000000e-01 ,  3.51100000e+01],
 [1.00000000e-01 ,  3.76500000e+01],
 [1.00000000e-01 ,  4.03700000e+01],
 [1.00000000e-01 ,  4.32900000e+01],
 [1.00000000e-01 ,  4.64200000e+01],
 [1.00000000e-01 ,  4.97700000e+01],
 [1.00000000e-01 ,  5.33700000e+01],
 [1.00000000e-01 ,  5.72200000e+01],
 [1.00000000e-01 ,  6.13600000e+01],
 [1.00000000e-01 ,  6.57900000e+01],
 [1.00000000e-01 ,  7.05500000e+01],
 [1.00000000e-01 ,  7.56500000e+01],
 [1.00000000e-01 ,  8.11100000e+01],
 [1.00000000e-01 ,  8.69700000e+01],
 [1.00000000e-01 ,  9.32600000e+01],
 [1.00000000e-01 ,  1.00000000e+02],
 [1.00000000e-01 ,  1.07200000e+02],
 [1.00000000e-01 ,  1.15000000e+02],
 [1.00000000e-01 ,  1.23300000e+02],
 [1.00000000e-01 ,  1.32200000e+02],
 [1.00000000e-01 ,  1.41700000e+02],
 [1.00000000e-01 ,  1.52000000e+02],
 [1.00000000e-01 ,  1.63000000e+02],
 [1.00000000e-01 ,  1.74800000e+02],
 [1.00000000e-01 ,  1.87400000e+02],
 [1.00000000e-01 ,  2.00900000e+02],
 [1.00000000e-01 ,  2.15400000e+02],
 [1.00000000e-01 ,  2.31000000e+02],
 [1.00000000e-01 ,  2.47700000e+02],
 [1.00000000e-01 ,  2.65600000e+02],
 [1.00000000e-01 ,  2.84800000e+02],
 [1.00000000e-01 ,  3.05400000e+02],
 [1.00000000e-01 ,  3.27500000e+02],
 [1.00000000e-01 ,  3.51100000e+02],
 [1.00000000e-01 ,  3.76500000e+02],
 [1.00000000e-01 ,  4.03700000e+02],
 [1.00000000e-01 ,  4.32900000e+02],
 [1.00000000e-01 ,  4.64200000e+02],
 [1.00000000e-01 ,  4.97700000e+02],
 [1.00000000e-01 ,  5.33700000e+02],
 [1.00000000e-01 ,  5.72200000e+02],
 [1.00000000e-01 ,  6.13600000e+02],
 [1.00000000e-01 ,  6.57900000e+02],
 [1.00000000e-01 ,  7.05500000e+02],
 [1.00000000e-01 ,  7.56500000e+02],
 [1.00000000e-01 ,  8.11100000e+02],
 [1.00000000e-01 ,  8.69700000e+02],
 [1.00000000e-01 ,  9.32600000e+02],
 [1.00000000e-01 ,  1.00000000e+03],
 [2.00000000e-01 ,  7.56500000e+00],
 [2.00000000e-01 ,  8.11100000e+00],
 [2.00000000e-01 ,  8.69700000e+00],
 [2.00000000e-01 ,  9.32600000e+00],
 [2.00000000e-01 ,  1.00000000e+01],
 [2.00000000e-01 ,  1.07200000e+01],
 [2.00000000e-01 ,  1.15000000e+01],
 [2.00000000e-01 ,  1.23300000e+01],
 [2.00000000e-01 ,  1.32200000e+01],
 [2.00000000e-01 ,  1.41700000e+01],
 [2.00000000e-01 ,  1.52000000e+01],
 [2.00000000e-01 ,  1.63000000e+01],
 [2.00000000e-01 ,  1.74800000e+01],
 [2.00000000e-01 ,  1.87400000e+01],
 [2.00000000e-01 ,  2.00900000e+01],
 [2.00000000e-01 ,  2.15400000e+01],
 [2.00000000e-01 ,  2.31000000e+01],
 [2.00000000e-01 ,  2.47700000e+01],
 [2.00000000e-01 ,  2.65600000e+01],
 [2.00000000e-01 ,  2.84800000e+01],
 [2.00000000e-01 ,  3.05400000e+01],
 [2.00000000e-01 ,  3.27500000e+01],
 [2.00000000e-01 ,  3.51100000e+01],
 [2.00000000e-01 ,  3.76500000e+01],
 [2.00000000e-01 ,  4.03700000e+01],
 [2.00000000e-01 ,  4.32900000e+01],
 [2.00000000e-01 ,  4.64200000e+01],
 [2.00000000e-01 ,  4.97700000e+01],
 [2.00000000e-01 ,  5.33700000e+01],
 [2.00000000e-01 ,  5.72200000e+01],
 [2.00000000e-01 ,  6.13600000e+01],
 [2.00000000e-01 ,  6.57900000e+01],
 [2.00000000e-01 ,  7.05500000e+01],
 [2.00000000e-01 ,  7.56500000e+01],
 [2.00000000e-01 ,  8.11100000e+01],
 [2.00000000e-01 ,  8.69700000e+01],
 [2.00000000e-01 ,  9.32600000e+01],
 [2.00000000e-01 ,  1.00000000e+02],
 [2.00000000e-01 ,  1.07200000e+02],
 [2.00000000e-01 ,  1.15000000e+02],
 [2.00000000e-01 ,  1.23300000e+02],
 [2.00000000e-01 ,  1.32200000e+02],
 [2.00000000e-01 ,  1.41700000e+02],
 [2.00000000e-01 ,  1.52000000e+02],
 [2.00000000e-01 ,  1.63000000e+02],
 [2.00000000e-01 ,  1.74800000e+02],
 [2.00000000e-01 ,  1.87400000e+02],
 [2.00000000e-01 ,  2.00900000e+02],
 [2.00000000e-01 ,  2.15400000e+02],
 [2.00000000e-01 ,  2.31000000e+02],
 [2.00000000e-01 ,  2.47700000e+02],
 [2.00000000e-01 ,  2.65600000e+02],
 [2.00000000e-01 ,  2.84800000e+02],
 [2.00000000e-01 ,  3.05400000e+02],
 [2.00000000e-01 ,  3.27500000e+02],
 [2.00000000e-01 ,  3.51100000e+02],
 [2.00000000e-01 ,  3.76500000e+02],
 [2.00000000e-01 ,  4.03700000e+02],
 [2.00000000e-01 ,  4.32900000e+02],
 [2.00000000e-01 ,  4.64200000e+02],
 [2.00000000e-01 ,  4.97700000e+02],
 [2.00000000e-01 ,  5.33700000e+02],
 [2.00000000e-01 ,  5.72200000e+02],
 [2.00000000e-01 ,  6.13600000e+02],
 [2.00000000e-01 ,  6.57900000e+02],
 [2.00000000e-01 ,  7.05500000e+02],
 [2.00000000e-01 ,  7.56500000e+02],
 [2.00000000e-01 ,  8.11100000e+02],
 [2.00000000e-01 ,  8.69700000e+02],
 [2.00000000e-01 ,  9.32600000e+02],
 [2.00000000e-01 ,  1.00000000e+03],
 [2.30000000e+00 ,  7.56500000e+00],
 [2.30000000e+00 ,  8.11100000e+00],
 [2.30000000e+00 ,  8.69700000e+00],
 [2.30000000e+00 ,  9.32600000e+00],
 [2.30000000e+00 ,  1.00000000e+01],
 [2.30000000e+00 ,  1.07200000e+01],
 [2.30000000e+00 ,  1.15000000e+01],
 [2.30000000e+00 ,  1.23300000e+01],
 [2.30000000e+00 ,  1.32200000e+01],
 [2.30000000e+00 ,  1.41700000e+01],
 [2.30000000e+00 ,  1.52000000e+01],
 [2.30000000e+00 ,  1.63000000e+01],
 [2.30000000e+00 ,  1.74800000e+01],
 [2.30000000e+00 ,  1.87400000e+01],
 [2.30000000e+00 ,  2.00900000e+01],
 [2.30000000e+00 ,  2.15400000e+01],
 [2.30000000e+00 ,  2.31000000e+01],
 [2.30000000e+00 ,  2.47700000e+01],
 [2.30000000e+00 ,  2.65600000e+01],
 [2.30000000e+00 ,  2.84800000e+01],
 [2.30000000e+00 ,  3.05400000e+01],
 [2.30000000e+00 ,  3.27500000e+01],
 [2.30000000e+00 ,  3.51100000e+01],
 [2.30000000e+00 ,  3.76500000e+01],
 [2.30000000e+00 ,  4.03700000e+01],
 [2.30000000e+00 ,  4.32900000e+01],
 [2.30000000e+00 ,  4.64200000e+01],
 [2.30000000e+00 ,  4.97700000e+01],
 [2.30000000e+00 ,  5.33700000e+01],
 [2.30000000e+00 ,  5.72200000e+01],
 [2.30000000e+00 ,  6.13600000e+01],
 [2.30000000e+00 ,  6.57900000e+01],
 [2.30000000e+00 ,  7.05500000e+01],
 [2.30000000e+00 ,  7.56500000e+01],
 [2.30000000e+00 ,  8.11100000e+01],
 [2.30000000e+00 ,  8.69700000e+01],
 [2.30000000e+00 ,  9.32600000e+01],
 [2.30000000e+00 ,  1.00000000e+02],
 [2.30000000e+00 ,  1.07200000e+02],
 [2.30000000e+00 ,  1.15000000e+02],
 [2.30000000e+00 ,  1.23300000e+02],
 [2.30000000e+00 ,  1.32200000e+02],
 [2.30000000e+00 ,  1.41700000e+02],
 [2.30000000e+00 ,  1.52000000e+02],
 [2.30000000e+00 ,  1.63000000e+02],
 [2.30000000e+00 ,  1.74800000e+02],
 [2.30000000e+00 ,  1.87400000e+02],
 [2.30000000e+00 ,  2.00900000e+02],
 [2.30000000e+00 ,  2.15400000e+02],
 [2.30000000e+00 ,  2.31000000e+02],
 [2.30000000e+00 ,  2.47700000e+02],
 [2.30000000e+00 ,  2.65600000e+02],
 [2.30000000e+00 ,  2.84800000e+02],
 [2.30000000e+00 ,  3.05400000e+02],
 [2.30000000e+00 ,  3.27500000e+02],
 [2.30000000e+00 ,  3.51100000e+02],
 [2.30000000e+00 ,  3.76500000e+02],
 [2.30000000e+00 ,  4.03700000e+02],
 [2.30000000e+00 ,  4.32900000e+02],
 [2.30000000e+00 ,  4.64200000e+02],
 [2.30000000e+00 ,  4.97700000e+02],
 [2.30000000e+00 ,  5.33700000e+02],
 [2.30000000e+00 ,  5.72200000e+02],
 [2.30000000e+00 ,  6.13600000e+02],
 [2.30000000e+00 ,  6.57900000e+02],
 [2.30000000e+00 ,  7.05500000e+02],
 [2.30000000e+00 ,  7.56500000e+02],
 [2.30000000e+00 ,  8.11100000e+02],
 [2.30000000e+00 ,  8.69700000e+02],
 [2.30000000e+00 ,  9.32600000e+02],
 [2.30000000e+00 ,  1.00000000e+03],
 [2.40000000e+00 ,  7.56500000e+00],
 [2.40000000e+00 ,  8.11100000e+00],
 [2.40000000e+00 ,  8.69700000e+00],
 [2.40000000e+00 ,  9.32600000e+00],
 [2.40000000e+00 ,  1.00000000e+01],
 [2.40000000e+00 ,  1.07200000e+01],
 [2.40000000e+00 ,  1.15000000e+01],
 [2.40000000e+00 ,  1.23300000e+01],
 [2.40000000e+00 ,  1.32200000e+01],
 [2.40000000e+00 ,  1.41700000e+01],
 [2.40000000e+00 ,  1.52000000e+01],
 [2.40000000e+00 ,  1.63000000e+01],
 [2.40000000e+00 ,  1.74800000e+01],
 [2.40000000e+00 ,  1.87400000e+01],
 [2.40000000e+00 ,  2.00900000e+01],
 [2.40000000e+00 ,  2.15400000e+01],
 [2.40000000e+00 ,  2.31000000e+01],
 [2.40000000e+00 ,  2.47700000e+01],
 [2.40000000e+00 ,  2.65600000e+01],
 [2.40000000e+00 ,  2.84800000e+01],
 [2.40000000e+00 ,  3.05400000e+01],
 [2.40000000e+00 ,  3.27500000e+01],
 [2.40000000e+00 ,  3.51100000e+01],
 [2.40000000e+00 ,  3.76500000e+01],
 [2.40000000e+00 ,  4.03700000e+01],
 [2.40000000e+00 ,  4.32900000e+01],
 [2.40000000e+00 ,  4.64200000e+01],
 [2.40000000e+00 ,  4.97700000e+01],
 [2.40000000e+00 ,  5.33700000e+01],
 [2.40000000e+00 ,  5.72200000e+01],
 [2.40000000e+00 ,  6.13600000e+01],
 [2.40000000e+00 ,  6.57900000e+01],
 [2.40000000e+00 ,  7.05500000e+01],
 [2.40000000e+00 ,  7.56500000e+01],
 [2.40000000e+00 ,  8.11100000e+01],
 [2.40000000e+00 ,  8.69700000e+01],
 [2.40000000e+00 ,  9.32600000e+01],
 [2.40000000e+00 ,  1.00000000e+02],
 [2.40000000e+00 ,  1.07200000e+02],
 [2.40000000e+00 ,  1.15000000e+02],
 [2.40000000e+00 ,  1.23300000e+02],
 [2.40000000e+00 ,  1.32200000e+02],
 [2.40000000e+00 ,  1.41700000e+02],
 [2.40000000e+00 ,  1.52000000e+02],
 [2.40000000e+00 ,  1.63000000e+02],
 [2.40000000e+00 ,  1.74800000e+02],
 [2.40000000e+00 ,  1.87400000e+02],
 [2.40000000e+00 ,  2.00900000e+02],
 [2.40000000e+00 ,  2.15400000e+02],
 [2.40000000e+00 ,  2.31000000e+02],
 [2.40000000e+00 ,  2.47700000e+02],
 [2.40000000e+00 ,  2.65600000e+02],
 [2.40000000e+00 ,  2.84800000e+02],
 [2.40000000e+00 ,  3.05400000e+02],
 [2.40000000e+00 ,  3.27500000e+02],
 [2.40000000e+00 ,  3.51100000e+02],
 [2.40000000e+00 ,  3.76500000e+02],
 [2.40000000e+00 ,  4.03700000e+02],
 [2.40000000e+00 ,  4.32900000e+02],
 [2.40000000e+00 ,  4.64200000e+02],
 [2.40000000e+00 ,  4.97700000e+02],
 [2.40000000e+00  , 5.33700000e+02],
 [2.40000000e+00  , 5.72200000e+02],
 [2.40000000e+00  , 6.13600000e+02],
 [2.40000000e+00  , 6.57900000e+02],
 [2.40000000e+00  , 7.05500000e+02],
 [2.40000000e+00  , 7.56500000e+02],
 [2.40000000e+00  , 8.11100000e+02],
 [2.40000000e+00  , 8.69700000e+02],
 [2.40000000e+00  , 9.32600000e+02],
 [2.40000000e+00  , 1.00000000e+03]])
#X = np.loadtxt("X.txt")

#vector is the dependent data
#vector = np.loadtxt("mob_vector.txt")

vector = [  2.12800000e+24  , 2.12100000e+24  , 2.11800000e+24  , 2.12000000e+24,
   2.12400000e+24 ,  2.12900000e+24 ,  2.13400000e+24 ,  2.14000000e+24,
   2.14600000e+24 ,  2.15100000e+24 ,  2.15600000e+24 ,  2.16100000e+24,
   2.16500000e+24 ,  2.16900000e+24 ,  2.17300000e+24 ,  2.17700000e+24,
   2.18100000e+24 ,  2.18600000e+24 ,  2.19000000e+24  , 2.19400000e+24,
   2.19900000e+24 ,  2.20400000e+24 ,  2.21000000e+24 ,  2.21600000e+24,
   2.22300000e+24 ,  2.23000000e+24 ,  2.23800000e+24 ,  2.24700000e+24,
   2.25600000e+24 ,  2.26700000e+24 ,  2.27800000e+24 ,  2.29100000e+24,
   2.30500000e+24 ,  2.32000000e+24 ,  2.33400000e+24 ,  2.35200000e+24,
   2.37000000e+24 ,  2.39000000e+24 ,  2.41100000e+24 ,  2.43400000e+24,
   2.45700000e+24 ,  2.48200000e+24 ,  2.50900000e+24 , 2.53600000e+24,
   2.56400000e+24 ,  2.59200000e+24 ,  2.62000000e+24  , 2.65000000e+24,
   2.68000000e+24 ,  2.70600000e+24 ,  2.73200000e+24  , 2.75700000e+24,
   2.77900000e+24 ,  2.79900000e+24 ,  2.81400000e+24 ,  2.82800000e+24,
   2.83700000e+24 ,  2.84300000e+24 ,  2.84500000e+24 ,  2.84300000e+24,
   2.83700000e+24 ,  2.82700000e+24 ,  2.81400000e+24 ,  2.79900000e+24,
   2.78000000e+24 ,  2.75800000e+24 ,  2.73500000e+24 ,  2.71100000e+24,
   2.68600000e+24 ,  2.66100000e+24 ,  2.63400000e+24 ,  2.11200000e+24,
   2.09800000e+24 ,  2.09100000e+24 ,  2.08800000e+24 ,  2.08900000e+24,
   2.09200000e+24 ,  2.09500000e+24 ,  2.10000000e+24 ,  2.10400000e+24,
   2.10900000e+24 ,  2.11300000e+24 ,  2.11700000e+24 ,  2.12100000e+24,
   2.12400000e+24  , 2.12800000e+24  , 2.13200000e+24 ,  2.13600000e+24,
   2.13900000e+24 ,  2.14300000e+24 ,  2.14800000e+24 ,  2.15200000e+24,
   2.15700000e+24 ,  2.16100000e+24 ,  2.16700000e+24  , 2.17300000e+24,
   2.18000000e+24 ,  2.18800000e+24 ,  2.19600000e+24 ,  2.20500000e+24,
   2.21400000e+24  , 2.22400000e+24 ,  2.23500000e+24 ,  2.24700000e+24,
   2.26000000e+24 ,  2.27600000e+24  , 2.29100000e+24 ,  2.30800000e+24,
   2.32600000e+24 ,  2.34500000e+24 ,  2.36600000e+24 ,  2.38700000e+24,
   2.40900000e+24 ,  2.43200000e+24 ,  2.45600000e+24  , 2.48100000e+24,
   2.50700000e+24  , 2.53300000e+24 ,  2.55800000e+24  , 2.58400000e+24,
   2.60800000e+24 ,  2.63200000e+24 ,  2.65300000e+24  , 2.67300000e+24,
   2.68900000e+24 ,  2.70500000e+24 ,  2.71500000e+24 ,  2.72300000e+24,
   2.72700000e+24 ,  2.72700000e+24 ,  2.72400000e+24 ,  2.71600000e+24,
   2.70600000e+24 ,  2.69200000e+24 ,  2.67500000e+24 ,  2.65500000e+24,
   2.63300000e+24 ,  2.60900000e+24 ,  2.58300000e+24  , 2.55700000e+24,
   2.52900000e+24  , 2.50100000e+24 ,  2.21400000e+24 ,  2.09600000e+24,
   1.98500000e+24 ,  1.88800000e+24 ,  1.80700000e+24 ,  1.74000000e+24,
   1.68800000e+24 ,  1.64800000e+24 ,  1.61700000e+24 ,  1.59300000e+24,
   1.57500000e+24 ,  1.56100000e+24 ,  1.55100000e+24 ,  1.54200000e+24,
   1.53600000e+24 ,  1.53000000e+24 ,  1.52600000e+24 ,  1.52300000e+24,
   1.52000000e+24 ,  1.51700000e+24 ,  1.51500000e+24 ,  1.51400000e+24,
   1.51300000e+24 , 1.51200000e+24  , 1.51100000e+24  , 1.51100000e+24,
   1.51000000e+24 ,  1.51100000e+24 ,  1.51100000e+24 ,  1.51200000e+24,
   1.51200000e+24 ,  1.51300000e+24  , 1.51400000e+24  , 1.51500000e+24,
   1.51700000e+24 ,  1.51800000e+24  , 1.52000000e+24  , 1.52300000e+24,
   1.52500000e+24  , 1.52800000e+24 ,  1.53000000e+24 ,  1.53300000e+24,
   1.53700000e+24 ,  1.54000000e+24 ,  1.54300000e+24 ,  1.54600000e+24,
   1.54900000e+24 ,  1.55200000e+24 ,  1.55500000e+24 ,  1.55700000e+24,
   1.55900000e+24 ,  1.56000000e+24 ,  1.56100000e+24 ,  1.56100000e+24,
   1.55900000e+24 ,  1.55700000e+24 ,  1.55400000e+24 ,  1.55000000e+24,
   1.54400000e+24 ,  1.53700000e+24 ,  1.52800000e+24 ,  1.51800000e+24,
   1.50600000e+24 ,  1.49200000e+24  , 1.47700000e+24 ,  1.46000000e+24,
   1.44200000e+24 ,  1.42300000e+24 ,  1.40200000e+24 ,  1.38100000e+24,
   1.35800000e+24 ,  2.21200000e+24 ,  2.09700000e+24 ,  1.98800000e+24,
   1.88900000e+24 ,  1.80500000e+24 ,  1.73600000e+24 ,  1.68000000e+24,
   1.63700000e+24 ,  1.60400000e+24 ,  1.57900000e+24 ,  1.56000000e+24,
   1.54500000e+24  , 1.53400000e+24  , 1.52500000e+24 ,  1.51800000e+24,
   1.51200000e+24 ,  1.50700000e+24  , 1.50400000e+24  , 1.50000000e+24,
   1.49800000e+24  , 1.49600000e+24 ,  1.49400000e+24 ,  1.49300000e+24,
   1.49200000e+24 ,  1.49100000e+24 ,  1.49000000e+24 ,  1.49000000e+24,
   1.49000000e+24 ,  1.49000000e+24 ,  1.49000000e+24  , 1.49000000e+24,
   1.49100000e+24 ,  1.49200000e+24 ,  1.49400000e+24  , 1.49500000e+24,
   1.49600000e+24 ,  1.49800000e+24  , 1.49900000e+24 ,  1.50200000e+24,
   1.50400000e+24 ,  1.50700000e+24  , 1.50900000e+24 ,  1.51100000e+24,
   1.51400000e+24 ,  1.51700000e+24 ,  1.52000000e+24 ,  1.52200000e+24,
   1.52500000e+24  , 1.52700000e+24 ,  1.52900000e+24 ,  1.53100000e+24,
   1.53200000e+24 ,  1.53300000e+24  , 1.53200000e+24 ,  1.53100000e+24,
   1.52900000e+24  , 1.52500000e+24 ,  1.52100000e+24 ,  1.51500000e+24,
   1.50800000e+24 ,  1.49900000e+24  , 1.48900000e+24 ,  1.47700000e+24,
   1.46400000e+24 ,  1.44900000e+24  , 1.43300000e+24 ,  1.41500000e+24,
   1.39600000e+24  , 1.37600000e+24  , 1.35500000e+24 ,  1.33300000e+24]


#e_field = np.loadtxt("e_field_vector.txt")
#e_field are the x axis values and one of the independent variables.
e_field = [    7.565   ,  8.111  ,   8.697  ,   9.326   , 10.    ,   10.72  ,  11.5,
    12.33  ,   13.22  ,   14.17  ,   15.2   ,   16.3  ,    17.48  ,   18.74,
    20.09  ,  21.54  ,   23.1   ,   24.77  ,   26.56  ,   28.48  ,   30.54,
    32.75  ,   35.11  ,   37.65 ,    40.37  ,   43.29  ,   46.42  ,   49.77,
    53.37  ,   57.22  ,   61.36  ,   65.79  ,   70.55 ,    75.65  ,   81.11,
    86.97  ,   93.26  ,  100.  ,    107.2   ,  115.   ,   123.3  ,   132.2,
   141.7   ,  152.   ,   163.   ,   174.8   ,  187.4  ,   200.9  ,   215.4  ,   231.,
   247.7  ,   265.6  ,   284.8  ,   305.4  ,   327.5  ,   351.1  ,   376.5,
   403.7   ,  432.9  ,   464.2  ,   497.7  ,   533.7  ,   572.2  ,   613.6,
   657.9   ,  705.5   ,  756.5  ,   811.1  ,   869.7  ,   932.6  ,  1000.   ]

for x in range(71):
    #predict is an independent variable for which we'd like to predict the value

    P = e_field[x]

    predict= [1.6, P]
    predict=np.reshape(predict,(1,-1))

    #generate a model of polynomial features
    poly = PolynomialFeatures(degree=2)

    #transform the x data for proper fitting (for single variable type it returns,[1,x,x**2])
    X_ = poly.fit_transform(X)

    #transform the prediction to fit the model type
    predict_ = poly.fit_transform(predict)

    #here we can remove polynomial orders we don't want
    #for instance I'm removing the `x` component
    X_ = np.delete(X_,(1),axis=1)
    predict_ = np.delete(predict_,(1),axis=1)

    #generate the regression object
    clf = LinearRegression()
    #preform the actual regression
    clf.fit(X_, vector)

    #print("X_ = ",X_)
    #print("predict_ = ",predict_)
    #print("Prediction = ",clf.predict(predict_))

    plt.scatter(X[:,1],vector, color = "red", marker = ".")

    plt.scatter(e_field[x], clf.predict(predict_),color="blue", marker=".")

real_mob = [  2.10600000e+24  , 2.02200000e+24 ,  1.95500000e+24 ,  1.90300000e+24,
   1.86200000e+24 ,  1.83200000e+24  , 1.81000000e+24  , 1.79400000e+24,
   1.78200000e+24 ,  1.77300000e+24 ,  1.76700000e+24  , 1.76200000e+24,
   1.75900000e+24  , 1.75600000e+24 ,  1.75500000e+24 ,  1.75400000e+24,
   1.75300000e+24 ,  1.75300000e+24 ,  1.75300000e+24 ,  1.75300000e+24,
   1.75400000e+24 ,  1.75500000e+24 ,  1.75600000e+24 ,  1.75700000e+24,
   1.75800000e+24 ,  1.76000000e+24 ,  1.76300000e+24 ,  1.76600000e+24,
   1.76800000e+24 ,  1.77100000e+24 ,  1.77400000e+24 ,  1.77800000e+24,
   1.78200000e+24 ,  1.78700000e+24 ,  1.79300000e+24 ,  1.79800000e+24,
   1.80400000e+24 ,  1.81000000e+24 ,  1.81700000e+24 ,  1.82400000e+24,
   1.83200000e+24 ,  1.84000000e+24 ,  1.84800000e+24 ,  1.85600000e+24,
   1.86500000e+24 ,  1.87400000e+24 ,  1.88200000e+24 ,  1.89100000e+24,
   1.89900000e+24 ,  1.90600000e+24 ,  1.91300000e+24 ,  1.91900000e+24,
   1.92500000e+24 ,  1.92900000e+24 ,  1.93000000e+24 ,  1.93100000e+24,
   1.93000000e+24 ,  1.92700000e+24 ,  1.92100000e+24 ,  1.91400000e+24,
   1.90400000e+24 ,  1.89200000e+24 ,  1.87800000e+24 ,  1.86200000e+24,
   1.84400000e+24 ,  1.82300000e+24 ,  1.80100000e+24 ,  1.77700000e+24,
   1.75200000e+24 ,  1.72600000e+24 ,  1.69900000e+24]


plt.plot(e_field[0:71],real_mob,color = "green", label="Real Data")

plt.scatter(e_field[x],clf.predict(predict_) ,color="blue", label="Prediction", marker=".")
plt.scatter(X[:,1],vector,color="red", label="Training Data",marker=".")

plt.xlim(min(e_field)-1,max(e_field)+100)
#plt.yscale("log")
plt.xscale("log")
plt.title("ML prediction and real mobility data")
plt.xlabel("Reduced Electric Field (E/N)")
plt.ylabel("Mobility 1/m/V/s")
plt.legend()

3 个答案:

答案 0 :(得分:1)

当我制作X与矢量的3D散点图时,似乎没有一个简单的多项式可以很好地拟合您的帖子中的数据。请参见下图。

scatterplot

答案 1 :(得分:1)

正如提到的另一个答案,在每个维度中,您都有多个拐点,因此您至少需要三次回归才能匹配该属性。

您的数据也非常稀疏,除非有充分的理由说明为什么应该有一个独特的多项式来完美地模拟潜在现象,否则我会说您的预测实际上是非常好的。先验地,回归方法不知道数据在哪里以及如何填充数据中的空白。

例如,如果我告诉您f(0)= 0和f(1)= 1,您将使用哪个二次方来填补空白并找到f(0.5)?您没有足够的信息来可靠地确定这一点。数据中存在一个类似的问题,即在一个维度中,您有很多数据点,但在另一维度中,您基本上(有一个小错误)有两个带有数据的X坐标,并且您要在他们。数据太稀疏,无法可靠地做到这一点。

答案 2 :(得分:1)

得到与@James Phillips类似的数字:

\1

enter image description here

您拥有2n个2D投影:

enter image description here enter image description here

分别是:

from matplotlib import pyplot
from mpl_toolkits.mplot3d import Axes3D
fig = pyplot.figure()
ax = Axes3D(fig)
ax.scatter(X[:,0], X[:,1], vector)

我的阅读是,您有4个完美的数据组(或者可以假设它们是2个),对于每个组,您可以尝试使用高阶策略(如其他评论所讨论)。一无所获地一次对所有4个进行线性回归,因为在数据的图表2上,您的估计线平均地卡在根本没有点的组之间。基本上,您不能通过简单的线性回归有效地解决图2,而其他维度在这里无济于事。

根据数据的来源(具体是分组的性质),您可以执行以下操作:

  • 通过X [:,0]手动将样本分为4组,并进行4个回归
  • 如果您认为4组的点数保持相等,则另一个方向是分位数回归
  • 预处理数据集,如果您觉得将来会不稳定,可以采用一些聚类过程