Event

Towards Reliable LLM Reasoning: Coordinated Agents, Variance-Aware Evaluation, and Lean Inference

Featured image

Location

Date

Type

Title

Towards Reliable LLM Reasoning: Coordinated Agents, Variance-Aware Evaluation, and Lean Inference


Abstract

Large language models (LLMs) are increasingly deployed as reasoning engines, yet their practical use remains constrained by three persistent challenges: achieving high-quality reasoning at low cost, measuring performance reliably, and ensuring efficient, reproducible deployment. In this talk, I will present a research agenda addressing these challenges through new methods, benchmarks, and systems for practical LLM reasoning.

I begin with coordination as a pathway to efficiencyFleet of Agents (FoA) introduces a framework where swarms of lightweight LLM agents explore search spaces in parallel and are resampled through a genetic-style process. This design shows that orchestration can often matter more than sheer size, enabling smaller models to outperform larger ones while achieving superior cost-quality trade-offs across diverse reasoning tasks.

Next, I turn to evaluation as a foundation for trustReasonBench exposes the limits of single-run reporting by systematically quantifying the run-to-run variability of LLM reasoning. Through variance-aware metrics, it reveals the hidden instability and cost unpredictability of many reasoning strategies, highlighting reproducibility as a first-class requirement for reliable reasoning.

Finally, I focus on systems as enablers of sustainable deployment. I present CacheSaver, the first modular client-side framework for high-level inference optimization. By introducing a namespace-aware caching mechanism, CacheSaver reduces cost and carbon emissions while preserving statistical integrity, making large-scale experimentation and deployment more affordable and sustainable without compromising reproducibility.

Together, these contributions chart a path toward LLM reasoning that is not only more powerful, but also leaner, more reliable, and environmentally responsible.


Bio

Akhil Arora is a Tenure-Track Assistant Professor of Computer Science at Aarhus University, where he heads the CLAN for AI Research on Language and Networks (or “CLAN” for short). He is a fellow of the Copenhagen Center for Social Data Science (SODAS), an affiliate of the Pioneer Centre for AI (P1), and a formal collaborator of the Wikimedia Foundation, the non-profit organization that manages Wikipedia and related projects. Akhil’s research lies broadly in human-centered AI with a focus on improving human knowledge-seeking, bridging knowledge gaps, and promoting knowledge equity on the Web. To this end, he devises methods and tools blending techniques from NLP, AI, Graph ML, and Computational social science. Recently, his group has been devising robust, trustworthy, accessible, and efficient LLM inference strategies.

Akhil received his PhD in Computer Science from EPFL (2024) in Switzerland, his MS from IIT Kanpur (2013), and his undergraduate degree from NCU Gurgaon (2010). In days of yore, he spent close to five years in the industry working with the research labs of Xerox and American Express as a Research Scientist. His work on influence maximization has been recognized as the 8th most influential paper of SIGMOD 2017 by Paper Digest and received the 2018 ACM SIGMOD Most Reproducible Paper Award. He is a recipient of the prestigious EDIC Doctoral Fellowship, an alumnus of the coveted Heidelberg Laureate Forum, and a DAAD AINet fellow on human-centered AI. Akhil is a director of the P1-programs on Green AI and AI & Society.

After the talk, we invite you to join us for pizza and networking in the Westlounge at the Pioneer Centre for AI, Øster Voldgade 3, København K.
Please note that registration is required.