Efficient Agentic Reasoning Through Self-Regulated Simulative Planning
Published in arXiv, 2026
Project website: https://sailing-lab.github.io/sr2am-self-regulated-planning/
SR²AM enables three modes within a single LLM: System I (reactive execution), System II (simulative planning), and System III (learned self-regulation, deciding when to plan, how far ahead, and when to act directly). A configurator regulates internal simulation—when to predict future states, how far, and when to skip. Thinking longer ≠ thinking smarter, SR²AM knows which one it needs. As a result, a 30B model can compete with 685B, 1T models at a fraction of the token cost. The LLM itself serves as the world model.
