2

Using TF 2.0 and tfp probability layers, I have constructed a keras.sequential model. I would like to export it for serving with TensorFlow Serving, and I would like to include the preprocessing and post processing steps in the servable.

My preprocessing steps are fairly simple-- fill NAs with explicit values, encoding a few strings as floats, normalize inputs, and denormalize outputs. For training, I have been doing the pre/post processing with pandas and numpy.

I know that I can export my Keras model's weights, wrap the keras.sequential model's architecture in a bigger TensorFlow graph, use low-level ops like tf.math.subtract(inputs, vector_of_feature_means) to do pre/post processing operations, define tf.placeholders for my inputs and outputs, and make a servable, but I feel like there has to be a cleaner way of doing this.

Is it possible to use keras.layers.Add() and keras.layers.Multiply() in a keras.sequence model for explicit preprocessing steps, or is there some more standard way of doing these things?

alift
  • 1,855
  • 2
  • 13
  • 28
James McKeown
  • 1,284
  • 1
  • 9
  • 14
  • 1
    https://stackoverflow.com/questions/41672114/add-tensorflow-pre-processing-to-existing-keras-model-for-use-in-tensorflow-ser?rq=1 is related, but a bit outdated – James McKeown Mar 21 '19 at 16:48

1 Answers1

0

The standard and efficient way of doing these things, as per my understanding is, to use Tensorflow Transform. It doesn't essentially mean that we should use entire TFX Pipeline if we have to use TF Transform. TF Transform can be used as a Standalone as well.

Tensorflow Transform creates a Beam Transormation Graph, which injects these Transformations as Constants in Tensorflow Graph. As these transformations are represented as Constants in the Graph, they will be consistent across Training and Serving. Advantages of that consistency across Training and Serving are

  1. Eliminates Training-Serving Skew
  2. Eliminates the need for having code in the Serving System, which improves the latency.

Sample Code for TF Transform is mentioned below:

Code for Importing all the Dependencies:

try:
  import tensorflow_transform as tft
  import apache_beam as beam
except ImportError:
  print('Installing TensorFlow Transform.  This will take a minute, ignore the warnings')
  !pip install -q tensorflow_transform
  print('Installing Apache Beam.  This will take a minute, ignore the warnings')
  !pip install -q apache_beam
  import tensorflow_transform as tft
  import apache_beam as beam

import tensorflow as tf
import tensorflow_transform.beam as tft_beam
from tensorflow_transform.tf_metadata import dataset_metadata
from tensorflow_transform.tf_metadata import dataset_schema

Below mentioned is the Pre-Processing function where we mention all the Transformations. As of now, TF Transform doesn't provide a direct API for Missing Value Imputation. So, only for that, we have to write our own code for that using low level APIs.

def preprocessing_fn(inputs):
  """Preprocess input columns into transformed columns."""
  # Since we are modifying some features and leaving others unchanged, we
  # start by setting `outputs` to a copy of `inputs.
  outputs = inputs.copy()

  # Scale numeric columns to have range [0, 1].
  for key in NUMERIC_FEATURE_KEYS:
    outputs[key] = tft.scale_to_0_1(outputs[key])

  for key in OPTIONAL_NUMERIC_FEATURE_KEYS:
    # This is a SparseTensor because it is optional. Here we fill in a default
    # value when it is missing.
    dense = tf.sparse_to_dense(outputs[key].indices,
                               [outputs[key].dense_shape[0], 1],
                               outputs[key].values, default_value=0.)
    # Reshaping from a batch of vectors of size 1 to a batch to scalars.
    dense = tf.squeeze(dense, axis=1)
    outputs[key] = tft.scale_to_0_1(dense)

  # For all categorical columns except the label column, we generate a
  # vocabulary but do not modify the feature.  This vocabulary is instead
  # used in the trainer, by means of a feature column, to convert the feature
  # from a string to an integer id.
  for key in CATEGORICAL_FEATURE_KEYS:
    tft.vocabulary(inputs[key], vocab_filename=key)

  # For the label column we provide the mapping from string to index.
  table = tf.contrib.lookup.index_table_from_tensor(['>50K', '<=50K'])
  outputs[LABEL_KEY] = table.lookup(outputs[LABEL_KEY])

  return outputs

You can refer below mentioned link for the detailed information and for the Tutorial of TF Transform.

https://www.tensorflow.org/tfx/transform/get_started

https://www.tensorflow.org/tfx/tutorials/transform/census

RakTheGeek
  • 405
  • 1
  • 5
  • 13