Yogendra Manawat | AI Researcher

Research

Published papers on AI systems, agents, and self-improvement.

Papers

ICML 2026

FeaturedHexoLabsarXiv · 2026

SIA: Self Improving AI with Harness & Weight Updates

Prannay Hebbar, Yogendra Manawat, Samuel Verboomen, Alesia Ivanova, Selvam Palanimalai, Kunal Bhatia, Vignesh Baskaran

SIA is a language-model agent that simultaneously modifies both task-specific agent scaffolding (tools, prompts, retry mechanisms) and model weights. Testing across legal charge classification, GPU kernel optimization, and RNA denoising shows combining both improvement methods surpasses prior approaches, achieving 25.1% over prior SOTA on LawBench and 12.4% faster GPU kernels.

arXiv GitHub

ICML 20262026

SIA-W: Self-Improving Agents with Test-Time Weight Updates

Prannay Hebbar, Samuel Verboomen, Selvam Palanimalai, Yogendra Manawat, Kunal Bhatia, Vignesh Baskaran

A framework for autonomous self-refinement via evolving the agent's foundational structure (utilities, instructions, workflows) and test-time reinforcement learning to fine-tune model weights once structure stabilizes. Evaluated on LawBench (+16 pp), GPU kernel optimization (−19% runtime), and biological data denoising.

ICML

ICML 20262026

Adaptive Proxy Evaluation for Autonomously Improving ML Agents

Vignesh Baskaran, Prannay Hebbar, Samuel Verboomen, Alesia Ivanova, Selvam Palanimalai, Kunal Bhatia, Yogendra Manawat

Addresses the cost/reliability tradeoff in proxy evaluations for autonomous ML systems. Proposes an adaptive proxy that evaluates every candidate under identical conditions and progressively raises fidelity as search converges. MLEvolve (a Monte Carlo Graph Search framework requiring no task-specific tuning) achieved SOTA MAE of 0.1354 on MLE-bench Ventilator Pressure Prediction within 12 hours.

ICML

ICML 20262026

Socrates: Structured Questioning Unlocks Latent Knowledge in AI Research Agents

Damir Vrabac, Prannay Hebbar, Yogendra Manawat, Selvam Palanimalai, Samuel Verboomen, Gurusha Juneja, Kunal Bhatia, Vignesh Baskaran

Tackles the gap between benchmark performance and weak practical research task performance, attributing it to insufficient knowledge activation. Proposes Socrates: a two-agent system pairing a tool-equipped Scientist with an advisor that can only ask questions. Improved Kaggle test scores on 4/5 MLE-bench tasks with a mean increase of ~56%.

ICML

ICML 20262026

AIE-Bench: Benchmarking Agents That Build Agents

Abhishek Mishra, Selvam Palanimalai, Yogendra Manawat, Samuel Verboomen, Prannay Hebbar, Damir Vrabac, Deepak Nathani, Sumeet Motwani, Kunal Bhatia, Vignesh Baskaran

A benchmark for evaluating whether an AI agent can modify another agent to improve it. Uses a meta-agent (suggests enhancements) and a target-agent (being improved), covering both meta-improvement and self-improvement scenarios across terminal interaction and tool calling domains.

OpenReview ICML