TensorFlow Hub is a great collection of models. Any developer looking for ready to use tensorflow models is already familiar with the hub, if not you are missing out on this gem of a collection for TensorFlow models. The models are organized by domain and versions making it very easy for you to find the models that you are looking for. But before we get into the details of how Tiyaro simplifies the problem of how you can simply start using these models, lets go through the normal workflow of a developer trying to use models from TensorFlow Hub.
What do you need to run these models?
- Find the model and download the saved model
- Run the model
- Find out the 'Function Signature' of the model so you can use it in your application.
Find the model and download the saved model
Just head over to model search on TensorFlow Hub and you can search for the models.
Run the model
Now that you have a saved model downloaded. You have a couple of options to run it.
- Run model locally for test and dev use
- Run model for Production use
Here is an excellent writeup to run models locally for test and dev.
TensorFlow serving as an excellent robust solution for running models in Production. It integrates really well with the saved model format and just like every other tensorflow feature you will find tons of tutorials on running Tensorflow serving.
Running models on GPU
There are many use cases where the additional performance and latency benefits derived from running inference on a GPU is not just a nice to have, it is a must have. This is a nice tutorial on enabling GPU support for tensorflow serving, it includes steps to download the CUDA libraries, re-compiling tensorflow serving with nVidia GPU support to running and testing the GPU support.
Real Issues serving a model
- Infrastructure and DevOps. - The real issue for developers is not the application to serve the model, whether it is TensorFlow Serving or some other solution. The DevOps required to serve the model in production and manage the infrastructure is the big burden. Without a dedicated team to handle this a developer has to spend their precious time and resources running the infrastructure to serve and debug the serving infra.
- Steep learning curve for REST API support of Tensorflow serving. Tensorflow serving supports both GRPC and REST interfaces to the models being served. But the REST API endpoints and the payload that is honored by the model is somewhat opaque. You need to decipher a lot of documentation to understand the serving function signatures and in many cases resort to trial and error to figure out what payloads actually work.
- Endpoint URLs are messy. Not a huge issue but even the various 'predict, classify, regress' APIs need the user to learn and understand the specifics of the TensorFlow Serving semantics and the model signatures.
- Request and Response documentation is non-standard. Where most REST APIs are documented using some sort of open standard like OpenAPI Spec or Swagger Spec. There is no standard documentation of the tensorflow serving request response formats.
- Adding GPU support is an extra step which will incur additional cost and time.
Find out the 'Function Signature' of the model so you can use it in your application.
This one is easy to understand for developers. If you need to invoke a model you need to know what input(s) that model takes, the format of the input parameters and the output that the model gives.
Tensorflow toolchain has done a great job of providing some of the basic tooling required to do this. For instance you can use the 'saved_model_cli' to see the default signature supported by the model. e.g. the imagenet efficientnet model has the following signature
$ saved_model_cli show --dir . --tag_set serve --signature_def serving_default The given SavedModel SignatureDef contains the following input(s): inputs['input_1'] tensor_info: dtype: DT_FLOAT shape: (-1, -1, -1, 3) name: serving_default_input_1:0 The given SavedModel SignatureDef contains the following output(s): outputs['output_1'] tensor_info: dtype: DT_FLOAT shape: (-1, 1000) name: StatefulPartitionedCall:0 Method name is: tensorflow/serving/predict
You can use this information in conjunction with the Tensorflow serving documentation to figure out the inputs required by this model and the output generated by it.
Real Issues with function signature
- Steep learning curve to understand the spec. As seen above the documentation of the function signature is very Tensorflow specific.
- A lot of the models take 'tensors' as input. Most developers are dealing with inputs like images, audio files, text. There is now a learning curve to map all those inputs to the inputs expected by these models.
- Lastly not all models have implemented all the metadata for the saved_model_cli to show you the serving signature.
2 Easy Steps - TensorFlow hub models to API with Tiyaro
Tiyaro allows you to rephrase the question that developers ask. Instead of asking "What do you need to run these models?", developers should be asking
What do you need to use these models?
With Tiyaro you just need the following 2 steps
- Find the model
- Use the model
Let's use the same model (imagenet effecientnet) from above as an example
Find the model
Simply search for the model in Tiyaro console
Click on the search result to see the model card
Use the Model
The model card includes
API endpoint for this model
The OpenAPI spec for the modelSo you know exactly what are the inputs and output from this model
Sample code to try the model
GPU support is built-inRunning a model on GPU is simply a matter of selecting the FelxGPU service tier in the model card and using the gpuflex url for that API as shown below.
That's it! Within minutes, if not seconds, you will go from a tensorflow hub model to using it in your app.
Give it a shot. Get started!