Collaborative Inference framework with the following components:
- a Fast inference pipeline running on edge;
- a Slow inference pipeline running on the cloud;
- a “success checker” policy to determine whether the Fast inference was “confident” about its prediction or not; if not, run the Slow inference to get the final prediction.
Both pipelines run the same model (same architecture and same weights) deployed in tflite format, but with different pre-processing.