更新

Question

我有一个数组，其中包含两个不同数量（alpha和eigRange）函数的错误值。

我这样填充我的数组：

   for j in range(n): 
        for i in range(alphaLen):
            alpha = alpha_list[i]
            c = train.eig(xt_, yt_,m-j, m,alpha, "cpu")
            costListTrain[j, i] = cost.err(xt_, xt_, yt_, c)

   normedValues=costListTrain/np.max(costListTrain.ravel())

，其中

n = 20
alpha_list = [0.0001,0.0003,0.0008,0.001,0.003,0.006,0.01,0.03,0.05]

我的costListTrain数组包含一些差异非常小的值，例如：

2.809458902485728 2.809458905776425 2.809458913576337 2.809459011062461 2.030326752376704 2.030329906064879 2.030337351188699 2.030428976282031 1.919840839066182 1.919846470077076 1.919859731440199 1.920021453630778 1.858436351617677 1.858444223016128 1.858462730482461 1.858687054377165 1.475871326997542 1.475901926855846 1.475973476249240 1.476822830933632 1.475775410801635 1.475806023102173 1.475877601316863 1.476727286424228 1.475774284270633 1.475804896751524 1.475876475382906 1.476726165223209 1.463578292548192 1.463611627166494 1.463689466240788 1.464609083309240 1.462859608038034 1.462893157900139 1.462971489632478 1.463896516033939 1.461912706143012 1.461954067956570 1.462047793798572 1.463079574605320 1.450581041157659 1.452770209885761 1.454835202839513 1.459676311335618 1.450581041157643 1.452770209885764 1.454835202839484 1.459676311335624 1.450581041157651 1.452770209885735 1.454835202839484 1.459676311335610 1.450581041157597 1.452770209885784 1.454835202839503 1.459676311335620 1.450581041157575 1.452770209885757 1.454835202839496 1.459676311335619 1.450581041157716 1.452770209885711 1.454835202839499 1.459676311335613 1.450581041157667 1.452770209885744 1.454835202839509 1.459676311335625 1.450581041157649 1.452770209885750 1.454835202839476 1.459676311335617 1.450581041157655 1.452770209885708 1.454835202839442 1.459676311335622 1.450581041157571 1.452770209885700 1.454835202839498 1.459676311335622

正如你可以在这里看到的价值非常接近！

我试图以x，y轴上有两个量的方式绘制这些数据，误差值用点颜色表示。

这就是我绘制数据的方式：

    alpha_list = np.log(alpha_list)        
    eigenvalues, alphaa  = np.meshgrid(eigRange, alpha_list) 

    vMin = np.min(costListTrain)
    vMax = np.max(costListTrain)

    plt.scatter(x, y, s=70, c=normedValues, vmin=vMin, vmax=vMax, alpha=0.50)

但结果不正确。

我试图通过将所有值除以max来规范化我的错误值，但它不起作用！
我能使其工作的唯一方法（不正确）是以两种不同的方式规范化我的数据。一个基于每一列（这意味着因子1是常数，因子2改变），另一个基于行（意味着因子2是常数而因子1是变化的）。但它并没有真正意义，因为我需要一个图来显示误差值上两个量之间的权衡。

更新

这就是我的意思。在对应于特征值的每一行上基于最大值标准化值：

maxsEigBasedTrain= np.amax(costListTrain.T,1)[:,np.newaxis]    
maxsEigBasedTest= np.amax(costListTest.T,1)[:,np.newaxis]

normEigCostTrain=costListTrain.T/maxsEigBasedTrain
normEigCostTest=costListTest.T/maxsEigBasedTest

根据与alphas对应的每列的最大值对标准化值进行标准化：

maxsAlphaBasedTrain= np.amax(costListTrain,1)[:,np.newaxis]
maxsAlphaBasedTest= np.amax(costListTest,1)[:,np.newaxis]

normAlphaCostTrain=costListTrain/maxsAlphaBasedTrain
normAlphaCostTest=costListTest/maxsAlphaBasedTest

情节1：

enter image description here

哪里没有。 eigenvalue = 10和alpha更改（应对应于图1的第10列）：

enter image description here

其中alpha = 0.0001和特征值改变（应对应于plot1的第一行）

enter image description here

但是你可以看到结果与情节1不同！

更新：

只是为了澄清更多内容，这就是我读取数据的方式：

from sklearn.datasets.samples_generator import make_regression

rng = np.random.RandomState(0)
diabetes = datasets.load_diabetes()

X_diabetes, y_diabetes = diabetes.data, diabetes.target
X_diabetes=np.c_[np.ones(len(X_diabetes)),X_diabetes]
ind = np.arange(X_diabetes.shape[0])
rng.shuffle(ind)
#===============================================================================
# Split Data 
#===============================================================================
import math
cross= math.ceil(0.7*len(X_diabetes))
ind_train = ind[:cross]
X_train, y_train = X_diabetes[ind_train], y_diabetes[ind_train]

ind_val=ind[cross:]
X_val,y_val=  X_diabetes[ind_val], y_diabetes[ind_val]

我还上传了.csv个文件HERE

log.csv包含绘图1标准化前的原始值

情节1的

normalizedLog.csv

情节2的

eigenConst.csv

情节3的

alphaConst.csv

Answer 1

我想我找到了答案。首先，我的代码中存在一个问题。我期待“特征值的数量”对应于行但在我的for循环中它们填充列。目前的答案是：

for i in range(alphaLen):
    for j in range(n): 
        alpha=alpha_list[i]
        c=train.eig(xt_, yt_,m-j,m,alpha,"cpu")
        costListTrain[i,j]=cost.err(xt_,xt_,yt_,c)
        costListTest[i,j]=cost.err(xt_,xv_,yv_,c)

在向朋友和同事提问后，我得到了这个答案：

我会假设默认的imshow和其他绘图命令你可能想要使用，对你的值做同样大小的间隔绘图。如果你可以设置为对数你应该没事。理想情况下，同样“填充的垃圾箱”可以证明最有效，我想。

用于绘图我只是从错误中减去min value并添加一个小数字，最后记录日志。

 temp=costListTrain- costListTrain.min()
 temp+=0.00000001
 extent = [0, 20,alpha_list[0], alpha_list[-1]]

 plt.imshow(np.log(temp),interpolation="nearest",cmap=plt.get_cmap('spectral'), extent =  extent, origin="lower")
 plt.colorbar()

结果是：

enter image description here

绘制高精度数据

更新

更新：

1 个答案: