Question

我想要提供给TensorFlow的图像数组。我想将图像置于均值周围，并标准化标准偏差。我跟着this answer，但我似乎无法将均值归零。我正在学习numpy所以也许我错过了一些简单的东西。

我目前的代码是：

import numpy as np

# Load pickled data
import pickle

# TODO: Fill this in based on where you saved the training and testing data

training_file = 'train.p'

with open(training_file, mode='rb') as f:
    train = pickle.load(f)

X_train, y_train = train['features'], train['labels']

# Let us inspect whether the data is centered.
for ch in range(3):
    print("for channel %s mean or clahe data: %s" %(
            ch, X_train[:,ch].mean()))

X_norm = np.copy(X_train)
for ch in range(3):
    X_norm[:, ch] = (X_norm[:, ch] - X_norm[:,ch].mean())/ X_norm[:, ch].std()

# Let us inspect our new mean.
for ch in range(3):
    print("for channel %s new mean for CLAHE data: %s new std: %s" % (
            ch, X_norm[:,ch].mean(), X_norm[:,ch].std()))

可以从here

获取所选数据集

输出：

for channel 0 mean or clahe data: 88.9090870931
for channel 1 mean or clahe data: 88.2472258708
for channel 2 mean or clahe data: 87.5765175619
for channel 0 new mean for CLAHE data: 8.77830238806 new std: 45.7207148838
for channel 1 new mean for CLAHE data: 8.79695563094 new std: 45.7780456089
for channel 2 new mean for CLAHE data: 8.71418658131 new std: 45.5661789057

我希望的结果是每个通道的平均值为零，标准差为1。

Answer 1

主要问题是数组的类型为uint8（整数0..255）。如果不改变数组的类型，这实际上不能居中或规范化。像这样：

X_norm = np.array(X_train, dtype=np.float, copy=True)

现在条目是浮点数，因此居中和缩放工作正常。但是，你可能会耗尽内存（数组很大），所以在尝试时，我只使用一小部分数据：

X_norm = np.array(X_train[:100], dtype=np.float, copy=True)

您的代码还有另一个问题：[:, ch]选择器不能按照您的想法执行操作。它沿第二轴（轴= 1）切片，而不是最后一轴。你的意思是[..., ch]，省略号代表“尽可能多的冒号”。请参阅NumPy indexing。

对调试很有用：print(X_norm.dtype)，print(X_norm[:, 0].shape)

python

1 个答案: