Keras输入形状错误"预期输入具有形状(1014,)但是具有形状的数组(1,)"

时间:2018-03-01 10:52:26

标签: tensorflow keras keras-layer keras-2

1。概述

  

我正在使用Conv1d进行文本分类任务。我定义了第一个输入层,如Input(shape=(self.input_size,), name='sent_input', dtype='int64') # here input_size is 1014

     

我正在使用(143614,) X(143614,)形状的numpy ndarray作为Y形成numpy ndarray作为batch_size128 {{1}的keras拟合函数}}。

     

但它引发了一个奇怪的错误ValueError: Error when checking input: expected sent_input to have shape (1014,) but got array with shape (1,)

2。我的输入概述(我如何生成输入):

  

我实际上正在this repository之后制作一个基于字符的cnn文本分类算法。我的原始数据框看起来像这样

|id|   text   | class_target(0 or 1)|
------------------------------------
|54| some text1|     0
------------------------------------
|55| some text2|     1

数据处理

class Data(object):
"""
Class to handle loading and processing of raw datasets.
"""
def __init__(self, data_source,
             alphabet="abcdefghijklmnopqrstuvwxyz0123456789-,;.!?:'\"/\\|_@#$%^&*~`+-=<>()[]{}",
             input_size=1014, batch_size=128, no_of_classes=1):
    """
    Initialization of a Data object.
    Args:
        data_source (str): Raw data file path
        alphabet (str): Alphabet of characters to index
        input_size (int): Size of input features
        batch_size (int): Batch size
        no_of_classes (int): Number of classes in data
    """
    self.alphabet = alphabet
    self.alphabet_size = len(self.alphabet)
    self.dict = {}  # Maps each character to an integer
    self.no_of_classes = no_of_classes
    for idx, char in enumerate(self.alphabet):
        self.dict[char] = idx + 1
    self.length = input_size
    self.batch_size = batch_size
    self.data_source = data_source

def load_data(self):
    self.data = pd.read_csv(self.data_source)

    print("Data loaded from " + self.data_source)

def get_all_data(self):
    return self.data['text'].apply(lambda x : self.str_to_indexes(x)).values,self.data['class_target'].values

def str_to_indexes(self, s):
    """
    Convert a string to character indexes based on character dictionary.

    Args:
        s (str): String to be converted to indexes
    Returns:
        str2idx (np.ndarray): Indexes of characters in s
    """
    s = s.lower()
    max_length = min(len(s), self.length)
    str2idx = np.zeros(self.length, dtype='int64')
    for i in range(1, max_length + 1):
        c = s[-i]
        if c in self.dict:
            str2idx[i - 1] = self.dict[c]
    return str2idx

在调用keras fit方法之前,我将这些函数称为

training_data = Data(data_source=data_config.training_data_source,
                     alphabet=data_config.alphabet,
                     input_size=data_config.input_size,
                     batch_size=128,
                     no_of_classes=data_config.num_of_classes)
training_data.load_data()
training_inputs, training_labels = training_data.get_all_data()
training_inputs = training_inputs.values
training_labels = training_labels.values

通过这种方式,我得到了输入。它的形状是(143614,)

第3。以下是模型概述

inputs = Input(shape=(self.input_size,), name='sent_input', dtype='int64') # here input_size is 1014
# Embedding layers
x = Embedding(self.alphabet_size + 1, self.embedding_size, input_length=self.input_size)(inputs)
# Convolution layers
for cl in self.conv_layers:
    x = Convolution1D(cl[0], cl[1])(x)
    x = ThresholdedReLU(self.threshold)(x)
    if not cl[2] is None:
        x = MaxPooling1D(cl[2])(x)
x = Flatten()(x)
# Fully connected layers
for fl in self.fully_connected_layers:
    x = Dense(fl)(x)
    x = ThresholdedReLU(self.threshold)(x)
    x = Dropout(self.dropout_p)(x)
# Output layer
predictions = Dense(self.num_of_classes, activation='sigmoid')(x) # here num_of_classes is 1
# Build and compile model
model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer=self.optimizer, loss=self.loss)

4.这是我的健康功能

model.fit(training_inputs, training_labels,
                   validation_data=(validation_inputs, validation_labels),
                   epochs=epochs,
                   batch_size=batch_size, # Here batch_size is 128
                   verbose=2,
                   callbacks=[tensorboard])

5.这是模型摘要

    Layer (type)                 Output Shape              Param #   
=================================================================
sent_input (InputLayer)      (None, 1014)              0         
_________________________________________________________________
embedding_21 (Embedding)     (None, 1014, 128)         8960      
_________________________________________________________________
conv1d_121 (Conv1D)          (None, 1008, 256)         229632    
_________________________________________________________________
thresholded_re_lu_161 (Thres (None, 1008, 256)         0         
_________________________________________________________________
max_pooling1d_61 (MaxPooling (None, 336, 256)          0         
_________________________________________________________________
conv1d_122 (Conv1D)          (None, 330, 256)          459008    
_________________________________________________________________
thresholded_re_lu_162 (Thres (None, 330, 256)          0         
_________________________________________________________________
max_pooling1d_62 (MaxPooling (None, 110, 256)          0         
_________________________________________________________________
conv1d_123 (Conv1D)          (None, 108, 256)          196864    
_________________________________________________________________
thresholded_re_lu_163 (Thres (None, 108, 256)          0         
_________________________________________________________________
conv1d_124 (Conv1D)          (None, 106, 256)          196864    
_________________________________________________________________
thresholded_re_lu_164 (Thres (None, 106, 256)          0         
_________________________________________________________________
conv1d_125 (Conv1D)          (None, 104, 256)          196864    
_________________________________________________________________
thresholded_re_lu_165 (Thres (None, 104, 256)          0         
_________________________________________________________________
conv1d_126 (Conv1D)          (None, 102, 256)          196864    
_________________________________________________________________
thresholded_re_lu_166 (Thres (None, 102, 256)          0         
_________________________________________________________________
max_pooling1d_63 (MaxPooling (None, 34, 256)           0         
_________________________________________________________________
flatten_21 (Flatten)         (None, 8704)              0         
_________________________________________________________________
dense_61 (Dense)             (None, 1024)              8913920   
_________________________________________________________________
thresholded_re_lu_167 (Thres (None, 1024)              0         
_________________________________________________________________
dropout_41 (Dropout)         (None, 1024)              0         
_________________________________________________________________
dense_62 (Dense)             (None, 1024)              1049600   
_________________________________________________________________
thresholded_re_lu_168 (Thres (None, 1024)              0         
_________________________________________________________________
dropout_42 (Dropout)         (None, 1024)              0         
_________________________________________________________________
dense_63 (Dense)             (None, 1)                 1025      
=================================================================
Total params: 11,449,601
Trainable params: 11,449,601
Non-trainable params: 0

5。这是错误追溯

    ValueErrorTraceback (most recent call last)
<ipython-input-118-29fc7a1b6318> in <module>()
----> 1 execfile('main.py')

/content/main.py in <module>()
     65                 epochs=training_config.epochs,
     66                 batch_size=training_config.batch_size,
---> 67                 checkpoint_every=training_config.checkpoint_every)

/content/zhang_char_cnn.py in train(self, training_inputs, training_labels, validation_inputs, validation_labels, epochs, batch_size, checkpoint_every)
    102                        batch_size=batch_size,
    103                        verbose=2,
--> 104                        callbacks=[tensorboard])
    105 
    106     def test(self, testing_inputs, testing_labels, batch_size):

/usr/local/lib/python2.7/dist-packages/keras/engine/training.pyc in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
   1635             sample_weight=sample_weight,
   1636             class_weight=class_weight,
-> 1637             batch_size=batch_size)
   1638         # Prepare validation data.
   1639         do_validation = False

/usr/local/lib/python2.7/dist-packages/keras/engine/training.pyc in _standardize_user_data(self, x, y, sample_weight, class_weight, check_array_lengths, batch_size)
   1481                                     self._feed_input_shapes,
   1482                                     check_batch_axis=False,
-> 1483                                     exception_prefix='input')
   1484         y = _standardize_input_data(y, self._feed_output_names,
   1485                                     output_shapes,

/usr/local/lib/python2.7/dist-packages/keras/engine/training.pyc in _standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
    121                             ': expected ' + names[i] + ' to have shape ' +
    122                             str(shape) + ' but got array with shape ' +
--> 123                             str(data_shape))
    124     return data
    125 

ValueError: Error when checking input: expected sent_input to have shape (1014,) but got array with shape (1,)

回应轰鸣声评论为什么我要喂一个(143614,)numpy ndarray。我是角色标签CNN的新手。我只是关注这个Github repo。据我所知,我将我的文本转换为数字表示,然后将其输入模型。在字符CNN中我必须量化文本,我从该回购中获取量化部分。完成所有这些预处理之后,如果我在数据上调用.shape,我就会将功能提供给模型,它会显示此(143614,)

我显然正在喂(143614,)形状numpy ndarray,但shape (1,)来自哪里。 有人可以帮我解决这个问题 。我在Google上搜索并在Stackoverflow中检查了很多Q&amp; A但是没有找到解决我问题的方法。

0 个答案:

没有答案