如何使用tensorflow feature_columns作为keras模型的输入

时间:2019-01-26 02:53:11

标签: tensorflow keras

Tensorflow的feature_columns API对于非数字特征处理非常有用。但是,当前的API文档更多是关于在tensorflow Estimator中使用feature_columns。是否可以使用feature_columns进行分类要素表示,然后基于tf.keras构建模型?

我找到的唯一参考文献是以下教程。它显示了如何将要素列提供给Keras顺序模型:Link

代码段如下:

smallest

问题是如何在keras功能模型API中使用自定义模型。我尝试了以下操作,但不起作用(tensorflow版本1.12)

from tensorflow.python.feature_column import feature_column_v2 as fc

feature_columns = [fc.embedding_column(ccv, dimension=3), ...]
feature_layer = fc.FeatureLayer(feature_columns)
model = tf.keras.Sequential([
    feature_layer,
    tf.keras.layers.Dense(128, activation=tf.nn.relu),
    tf.keras.layers.Dense(64, activation=tf.nn.relu),
    tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)
])
...
model.fit(dataset, steps_per_epoch=8) # dataset is created from tensorflow Dataset API

错误日志:

feature_layer = fc.FeatureLayer(feature_columns)
dense_features = feature_layer(features) # features is a dict of ndarrays in dataset
layer1 = tf.keras.layers.Dense(128, activation=tf.nn.relu)(dense_features)
layer2 = tf.keras.layers.Dense(64, activation=tf.nn.relu)(layer1)
output = tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)(layer2)
model = Model(inputs=dense_features, outputs=output)

我不知道如何将要素列转换为keras模型的输入。

4 个答案:

答案 0 :(得分:2)

您想要的行为可以实现,并且可以结合tf.feature_columnkeras functional API。而且,实际上,TF文档中没有提及。

此功能至少在TF 2.0.0-beta1中有效,但在以后的发行版中可能会更改甚至简化。

请在TensorFlow github存储库Unable to use FeatureColumn with Keras Functional API #27416中签出问题。在那里,您会找到有关tf.feature_columnKeras Functional API的有用评论。

因为您询问一般方法,所以我只复制上面链接中的示例片段。

from __future__ import absolute_import, division, print_function

import numpy as np
import pandas as pd

#!pip install tensorflow==2.0.0-alpha0
import tensorflow as tf

from tensorflow import feature_column
from tensorflow import keras
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split

URL = 'https://storage.googleapis.com/applied-dl/heart.csv'
dataframe = pd.read_csv(URL)
dataframe.head()

train, test = train_test_split(dataframe, test_size=0.2)
train, val = train_test_split(train, test_size=0.2)
print(len(train), 'train examples')
print(len(val), 'validation examples')
print(len(test), 'test examples')

# A utility method to create a tf.data dataset from a Pandas Dataframe
def df_to_dataset(dataframe, shuffle=True, batch_size=32):
  dataframe = dataframe.copy()
  labels = dataframe.pop('target')
  ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
  if shuffle:
    ds = ds.shuffle(buffer_size=len(dataframe))
  ds = ds.batch(batch_size)
  return ds

batch_size = 5 # A small batch sized is used for demonstration purposes
train_ds = df_to_dataset(train, batch_size=batch_size)
val_ds = df_to_dataset(val, shuffle=False, batch_size=batch_size)
test_ds = df_to_dataset(test, shuffle=False, batch_size=batch_size)

age = feature_column.numeric_column("age")

feature_columns = []
feature_layer_inputs = {}

# numeric cols
for header in ['age', 'trestbps', 'chol', 'thalach', 'oldpeak', 'slope', 'ca']:
  feature_columns.append(feature_column.numeric_column(header))
  feature_layer_inputs[header] = tf.keras.Input(shape=(1,), name=header)

# bucketized cols
age_buckets = feature_column.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])
feature_columns.append(age_buckets)

# indicator cols
thal = feature_column.categorical_column_with_vocabulary_list(
      'thal', ['fixed', 'normal', 'reversible'])
thal_one_hot = feature_column.indicator_column(thal)
feature_columns.append(thal_one_hot)
feature_layer_inputs['thal'] = tf.keras.Input(shape=(1,), name='thal', dtype=tf.string)

# embedding cols
thal_embedding = feature_column.embedding_column(thal, dimension=8)
feature_columns.append(thal_embedding)

# crossed cols
crossed_feature = feature_column.crossed_column([age_buckets, thal], hash_bucket_size=1000)
crossed_feature = feature_column.indicator_column(crossed_feature)
feature_columns.append(crossed_feature)

batch_size = 32
train_ds = df_to_dataset(train, batch_size=batch_size)
val_ds = df_to_dataset(val, shuffle=False, batch_size=batch_size)
test_ds = df_to_dataset(test, shuffle=False, batch_size=batch_size)

feature_layer = tf.keras.layers.DenseFeatures(feature_columns)
feature_layer_outputs = feature_layer(feature_layer_inputs)

x = layers.Dense(128, activation='relu')(feature_layer_outputs)
x = layers.Dense(64, activation='relu')(x)

baggage_pred = layers.Dense(1, activation='sigmoid')(x)

model = keras.Model(inputs=[v for v in feature_layer_inputs.values()], outputs=baggage_pred)

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.fit(train_ds)

答案 1 :(得分:1)

如果您使用tensorflow数据集API,那么该代码可以很好地实现。

featurlayer = keras.layers.DenseFeatures(feature_columns=feature_columns)
train_dataset = train_dataset.map(lambda x, y: (featurlayer(x), y))
test_dataset = test_dataset.map(lambda x, y: (featurlayer(x), y))

model.fit(train_dataset, epochs=, steps_per_epoch=, # all_data/batch_num = 
     validation_data=test_dataset,
     validation_steps=)

答案 2 :(得分:0)

tf.feature_column.input_layer 用户使用此功能,此api文档中有一个sample。 您可以将featur_columns转换为Tensor,然后将其用于Mode()

答案 3 :(得分:0)

我最近一直在阅读TensorFlow 2.0 alpha版本https://www.tensorflow.org/alpha/tutorials/keras/feature_columns#create_a_feature_layer的文档。它具有结合使用Keras和功能列API的示例。不确定TF 2.0是否是您要使用的