Problem
Each new model rollout required hand-offs across data, annotation, modelling, evaluation, and deployment teams — error-prone and slow. We needed reproducibility and the ability to ship without coordination overhead.
Approach
- 1.Modelled each stage of the ML lifecycle as a discrete agent with a typed contract: ingestion, annotation, evaluation, fine-tuning, benchmarking, inference orchestration.
- 2.Orchestrated agents through a controller with state, retries, and observability — so the whole pipeline could be triggered with one command.
- 3.Standardised evaluation: benchmark sets, accuracy tracking, and automated failure-case reports for every run.
- 4.Integrated with Azure and GCP for both training and serving so models could be promoted across environments deterministically.
Learnings
- Agentic doesn't have to mean 'LLM in a loop' — typed contracts between stages give you 80% of the productivity gains with 20% of the risk.
- The biggest unlock was observability: once every stage emitted structured events, ownership and debugging became trivial.