Lead SRE / AI Platform Engineer
Feb 2022 — PresentRemote
- › Built a multi-agent system on Temporal + Mastra for 24/7 support case response and investigations driving overnight support for IST on AWS Bedrock + Claude with confidence-gated automation.
- › Three-layer token optimization cut LLM cost 44-57%: 70% redundant-call skip, MCP transformers compressing tool output 10-15x, per-run cost observability built from scratch.
- › Own deployment pipeline reliability for 300+ services across commercial + GovCloud (FedRAMP) with 99%+ deployment success on AWS (EKS, Lambda, Step Functions).
- › SLI/SLO standards with AI-calibrated thresholds; pipeline-failure classification fixed a 42% noise rate.
- › Designed evaluation methodology for AI code review at enterprise scale — calibrated against thousands of production PRs before reaching developers.