python

时间:2017-01-14 16:34:23

标签: python numpy image-processing

我想要提供给TensorFlow的图像数组。我想将图像置于均值周围,并标准化标准偏差。我跟着this answer,但我似乎无法将均值归零。我正在学习numpy所以也许我错过了一些简单的东西。

我目前的代码是:

import numpy as np

# Load pickled data
import pickle

# TODO: Fill this in based on where you saved the training and testing data

training_file = 'train.p'

with open(training_file, mode='rb') as f:
    train = pickle.load(f)

X_train, y_train = train['features'], train['labels']

# Let us inspect whether the data is centered.
for ch in range(3):
    print("for channel %s mean or clahe data: %s" %(
            ch, X_train[:,ch].mean()))

X_norm = np.copy(X_train)
for ch in range(3):
    X_norm[:, ch] = (X_norm[:, ch] - X_norm[:,ch].mean())/ X_norm[:, ch].std()

# Let us inspect our new mean.
for ch in range(3):
    print("for channel %s new mean for CLAHE data: %s new std: %s" % (
            ch, X_norm[:,ch].mean(), X_norm[:,ch].std()))

可以从here

获取所选数据集

输出:

for channel 0 mean or clahe data: 88.9090870931
for channel 1 mean or clahe data: 88.2472258708
for channel 2 mean or clahe data: 87.5765175619
for channel 0 new mean for CLAHE data: 8.77830238806 new std: 45.7207148838
for channel 1 new mean for CLAHE data: 8.79695563094 new std: 45.7780456089
for channel 2 new mean for CLAHE data: 8.71418658131 new std: 45.5661789057

我希望的结果是每个通道的平均值为零,标准差为1。

1 个答案:

答案 0 :(得分:2)

主要问题是数组的类型为uint8(整数0..255)。如果不改变数组的类型,这实际上不能居中或规范化。像这样:

X_norm = np.array(X_train, dtype=np.float, copy=True)

现在条目是浮点数,因此居中和缩放工作正常。但是,你可能会耗尽内存(数组很大),所以在尝试时,我只使用一小部分数据:

X_norm = np.array(X_train[:100], dtype=np.float, copy=True)

您的代码还有另一个问题:[:, ch]选择器不能按照您的想法执行操作。它沿第二轴(轴= 1)切片,而不是最后一轴。你的意思是[..., ch],省略号代表“尽可能多的冒号”。请参阅NumPy indexing

对调试很有用:print(X_norm.dtype)print(X_norm[:, 0].shape)