Local Inference and Model Preparation
The Qualcomm AI Hub Models library provides a streamlined way to access and run pretrained models. A critical step in the workflow is input normalization: many models expect NCHW (channel-first) format, while standard image libraries often provide NHWC (channel-last). The tutorial demonstrates a to_nchw helper function that handles this conversion, ensuring tensors are correctly shaped before being passed to the model. Once formatted, developers can load models like MobileNet-V2, inspect their input specifications, and run inference locally using PyTorch.
Hardware-Aware Deployment Pipeline
Beyond local experimentation, the Qualcomm AI Hub enables a transition to production-ready hardware deployment. The workflow includes:
- Tracing and Compilation: Using
torch.jit.traceto convert PyTorch models into a format suitable for compilation. The hub allows users to submit these models for compilation into the TFLite runtime, which is optimized for mobile and edge hardware. - Cloud Profiling: With an API token, developers can submit compiled models to real Qualcomm hardware via the cloud. This provides accurate performance metrics (profiling) and allows for remote inference testing.
- Reproducible Demos: The library includes built-in CLI demos for models like YOLOv7, which can be executed via a standardized
run_demofunction. This allows developers to verify model performance and output formats before integrating them into custom applications.