Question

对于Keras来说是新手，尝试打印形状时遇到了问题，因此可以将其用作input_shape。到目前为止，这是我的代码：

df = pd.read_csv(pathname, encoding = "ISO-8859-1")
df = df[['content_cleaned', 'meaningful']] 
df = df.sample(frac=1) #Shuffling the data

X = np.asarray(df[['content_cleaned']])
y = np.asarray(df[['meaningful']])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=21) 

tokenizer = Tokenizer() 
X_train = keras.preprocessing.text.Tokenizer(num_words=100)
X_test = keras.preprocessing.text.Tokenizer(num_words=100)

encoder = LabelBinarizer()
encoder.fit(y_train) 
y_train = encoder.transform(y_train)
encoder.fit(y_test)
y_test = encoder.transform(y_test)

print(X_train.shape)

代码在最终的打印语句中失败。错误消息：

AttributeError: 'Tokenizer' object has no attribute 'shape'

再次，我对此很陌生，似乎无法弄清楚如何克服此错误。任何帮助都会很棒！

编辑：我对代码进行了一些修改，以尝试实现其他用户的建议。这是代码（已更改）：

# Create tokenizer
tokenizer = Tokenizer(num_words=100) #No row has more than 100 words.

#Tokenize the predictors (text)
X_train = tokenizer.sequences_to_matrix(X_train, mode="binary")
X_test = tokenizer.sequences_to_matrix(X_test, mode="binary")

在声明X_train变量时失败。错误消息是：

TypeError: '>=' not supported between instances of 'str' and 'int'

编辑2：进行以下更改，代码将运行。当我运行print命令时，什么都没打印：

X_train = tokenizer.sequences_to_matrix(int(input(X_train)), mode="binary")
X_test = tokenizer.sequences_to_matrix(int(input(X_test)), mode="binary")

Answer 1

我相信这是因为尽管您首先将其设置为numpy数组...

from django.utils.dateparse import parse_date converted_birthday = parse_date(birthdate)

...并提供数据...

X = np.asarray(df[['content_cleaned']])

...然后，将其设为Tokenizer对象，该对象显然没有'shape'属性。

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=21)

无法打印出我的张量的形状（Keras）

1 个答案: