Papers
arxiv:2510.16276

What Limits Agentic Systems Efficiency?

Published on Oct 18
· Submitted by Song on Oct 21
Authors:
,
,
,
,

Abstract

A caching framework with speculative execution reduces web environment latency in web-interactive agentic systems without degrading performance.

AI-generated summary

Large Language Models (LLMs), such as OpenAI-o1 and DeepSeek-R1, have demonstrated strong reasoning capabilities. To further enhance LLM capabilities, recent agentic systems, such as Deep Research, incorporate web interactions into LLM reasoning to mitigate uncertainties and reduce potential errors. However, existing research predominantly focuses on reasoning performance, often neglecting the efficiency of agentic systems. In this work, we present a comprehensive empirical study that identifies efficiency bottlenecks in web-interactive agentic systems. We decompose end-to-end latency into two primary components: LLM API latency and web environment latency. We conduct a comprehensive empirical study across 15 models and 5 providers to demonstrate high variability in API-based agentic systems. We observe that web environment latency can contribute as much as 53.7% to the overall latency in a web-based agentic system. To improve latency, we propose SpecCache, a caching framework augmented with speculative execution that can reduce web environment overhead. Extensive evaluations on two standard benchmarks show that our approach improves the cache hit rate by up to 58x compared to a random caching strategy, while reducing web environment overhead by up to 3.2x, without degrading agentic system performance.

Community

Paper submitter

This paper makes the following contributions:

  1. We identify two primary efficiency bottlenecks in web-interactive agentic systems: LLM API latency and web environment latency.
  2. We conduct a comprehensive empirical study across 15 models and 5 major providers, including OpenAI, Anthropic, Google, DeepSeek, and Together AI, revealing substantial variability in LLM API latency.
  3. We introduce SpecCache, a model-driven framework that overlaps environment interaction with model reasoning to reduce end-to-end latency, achieving up to a 58x improvement in cache hit rate and a 3.2x reduction in web environment overhead.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.16276 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.16276 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.16276 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.