Problem
Field engineers installing AirFiber devices across India had to be re-dispatched whenever installation photos were rejected by manual review — driving cost, delay, and customer churn. The system needed sub-second multimodal reasoning at very high throughput, with deterministic decision grounding.
Approach
- 1.Combined a vision model (Llama 3.2B Vision Instruct) with an LLM reasoner (Gemini Flash) for structured installation validation against a domain-specific rubric.
- 2.Engineered prompt templates with explicit grounding rules and output schemas to minimise hallucination and force deterministic decisions.
- 3.Deployed across Azure and GCP VM instances behind a low-latency inference layer; optimised batching and warm-pool strategy for under-30ms p95 latency.
- 4.Built failure-case capture and active-learning loop so misclassifications fed back into fine-tuning and benchmark sets.
Learnings
- Output grounding via strict schemas and rubric prompts was more impactful than raw model size for production reliability.
- Failure-case capture as a first-class system component compounds quickly — every rejection becomes training data.