
DPIIT
I explore the edges of AI and build products that move business metrics.
4 papers accepted at ICML 2026. Co-authored and co-built SIA. My work spans the full arc, from research to engineering to shipping products that create real outcomes.
Senior Research Scientist @ HexoLabs
Working toward building an AI Scientist
A language-model agent that simultaneously modifies task-specific scaffolding and model weights. Achieves 25.1% over prior SOTA on LawBench and 12.4% faster GPU kernels.
Autonomous self-refinement via evolving agent structure and test-time reinforcement learning. +16pp on LawBench, -19% GPU kernel runtime.
Addresses the cost/reliability tradeoff in proxy evaluations. MLEvolve achieved SOTA MAE of 0.1354 on MLE-bench within 12 hours.
A two-agent system pairing a Scientist with an advisor that can only ask questions. Improved Kaggle test scores on 4/5 MLE-bench tasks with a mean increase of ~56%.
A benchmark for evaluating whether an AI agent can modify another agent to improve it, covering meta-improvement and self-improvement scenarios.
A language-model agent that simultaneously modifies task-specific scaffolding and model weights. Achieves 25.1% over prior SOTA on LawBench and 12.4% faster GPU kernels.
Autonomous self-refinement via evolving agent structure and test-time reinforcement learning. +16pp on LawBench, -19% GPU kernel runtime.
Addresses the cost/reliability tradeoff in proxy evaluations. MLEvolve achieved SOTA MAE of 0.1354 on MLE-bench within 12 hours.
A two-agent system pairing a Scientist with an advisor that can only ask questions. Improved Kaggle test scores on 4/5 MLE-bench tasks with a mean increase of ~56%.
A benchmark for evaluating whether an AI agent can modify another agent to improve it, covering meta-improvement and self-improvement scenarios.
A language-model agent that simultaneously modifies task-specific scaffolding and model weights. Achieves 25.1% over prior SOTA on LawBench and 12.4% faster GPU kernels.
Autonomous self-refinement via evolving agent structure and test-time reinforcement learning. +16pp on LawBench, -19% GPU kernel runtime.
Addresses the cost/reliability tradeoff in proxy evaluations. MLEvolve achieved SOTA MAE of 0.1354 on MLE-bench within 12 hours.
A two-agent system pairing a Scientist with an advisor that can only ask questions. Improved Kaggle test scores on 4/5 MLE-bench tasks with a mean increase of ~56%.
A benchmark for evaluating whether an AI agent can modify another agent to improve it, covering meta-improvement and self-improvement scenarios.
Leading AI research and high-impact programs end-to-end, from architecture and novel research to production delivery.
Built a real-time AI calling system before audio-to-audio models existed. engineered low-latency voice pipelines from scratch and took it from zero to revenue.
Organizations I've worked with across AI research, engineering, and product development.

DPIIT

IP India

Bito

Soliton

Atomicwork

Dashtoon
Get in Touch
[email protected]Yogendra Manawat
Senior Research Scientist @ HexoLabs