Applied AI, in production
We engineer production-grade applications on Anthropic's Claude, bringing data-science discipline to LLM engineering — testing for consistency, calibrating confidence, and auditing the decisions the model makes. We pair this with a deep predictive-modelling and machine-learning toolbox.
Claude & LLM engineering
- Consistency testing — running prompts repeatedly to expose instability, then refining until the model gives the same answer every time on the same input
- Structured human-in-the-loop — when a decision is genuinely ambiguous, surfacing alternatives and reasoning for a one-click human review instead of guessing
- Multi-step workflows driven by explicit state machines, so the framework keeps the model on the rails through long-running sessions
- Prompt orchestration — system, tool, and per-state prompts designed for reliability under real-world inputs
- Model selection matched to assessed task complexity — strongest models where reasoning matters, faster models where it doesn't
- Tool use with validated outputs — Claude calling our functions and returning structured results we can verify, with retries and fallbacks built in
- Live data pipelines feeding current, structured information into the model
Predictive modelling & machine learning
- Machine learning — gradient-boosted models (GBMs) and related supervised methods
- Generalised linear models (GLMs) and statistical inference
- Simulation and optimisation algorithms
- Pricing and price-elasticity modelling
- Data analytics that go beyond dashboards
High-confidence outputs ship. Low-confidence ones become a structured conversation, not a black box. Classical ML where the answer is numerical, LLMs where it's linguistic or structural — with the statistical analytics layer over both that keeps everything auditable.