Abstract
A caching framework with speculative execution reduces web environment latency in web-interactive agentic systems without degrading performance.
Large Language Models (LLMs), such as OpenAI-o1 and DeepSeek-R1, have demonstrated strong reasoning capabilities. To further enhance LLM capabilities, recent agentic systems, such as Deep Research, incorporate web interactions into LLM reasoning to mitigate uncertainties and reduce potential errors. However, existing research predominantly focuses on reasoning performance, often neglecting the efficiency of agentic systems. In this work, we present a comprehensive empirical study that identifies efficiency bottlenecks in web-interactive agentic systems. We decompose end-to-end latency into two primary components: LLM API latency and web environment latency. We conduct a comprehensive empirical study across 15 models and 5 providers to demonstrate high variability in API-based agentic systems. We observe that web environment latency can contribute as much as 53.7% to the overall latency in a web-based agentic system. To improve latency, we propose SpecCache, a caching framework augmented with speculative execution that can reduce web environment overhead. Extensive evaluations on two standard benchmarks show that our approach improves the cache hit rate by up to 58x compared to a random caching strategy, while reducing web environment overhead by up to 3.2x, without degrading agentic system performance.
Community
This paper makes the following contributions:
- We identify two primary efficiency bottlenecks in web-interactive agentic systems: LLM API latency and web environment latency.
- We conduct a comprehensive empirical study across 15 models and 5 major providers, including OpenAI, Anthropic, Google, DeepSeek, and Together AI, revealing substantial variability in LLM API latency.
- We introduce SpecCache, a model-driven framework that overlaps environment interaction with model reasoning to reduce end-to-end latency, achieving up to a 58x improvement in cache hit rate and a 3.2x reduction in web environment overhead.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Speculative Actions: A Lossless Framework for Faster Agentic Systems (2025)
- BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks (2025)
- InfoAgent: Advancing Autonomous Information-Seeking Agents (2025)
- Asteria: Semantic-Aware Cross-Region Caching for Agentic LLM Tool Access (2025)
- A$^2$FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning (2025)
- How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench (2025)
- Democratizing Agentic AI with Fast Test-Time Scaling on the Edge (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper