Improve model card for AgentFlow (Qwen-2.5-7B-Instruct Backbone)
Browse filesThis PR significantly enhances the model card for the `AgentFlow` model by:
- Updating the YAML metadata with:
- `library_name: transformers` (confirmed by `config.json` for `Qwen2ForCausalLM` architecture).
- `pipeline_tag: text-generation` for better discoverability.
- `license: apache-2.0` (as consistently identified by colleagues).
- `language: en` to indicate the model's operational language.
- `base_model: Qwen/Qwen2-7B-Instruct` to specify the backbone model.
- Relevant `tags` including `llm`, `agent`, `tool-use`, `planning`, `qwen2`, and `reinforcement-learning`.
- Populating the model card content with a detailed description derived from the paper abstract and the GitHub repository's key features and motivations.
- Adding direct links to the paper, project page, GitHub repository, and Hugging Face demo.
- Including key features, experimental results (with images from the GitHub README), and a practical Python sample usage snippet directly from the GitHub repository.
- Filling in sections like "Uses", "Bias, Risks, and Limitations", "Training Details", and "Evaluation" with available information.
- Retaining the original model card's structure where appropriate and removing "More Information Needed" placeholders where content could be provided.
- Adding the full BibTeX citation and acknowledgements from the GitHub README.
Please review and merge if these updates are accurate and helpful.
|
@@ -1,199 +1,180 @@
|
|
| 1 |
---
|
| 2 |
library_name: transformers
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
---
|
| 5 |
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
## Model Details
|
| 13 |
|
| 14 |
### Model Description
|
| 15 |
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
|
| 19 |
|
| 20 |
-
|
| 21 |
-
- **Funded by [optional]:** [More Information Needed]
|
| 22 |
-
- **Shared by [optional]:** [More Information Needed]
|
| 23 |
-
- **Model type:** [More Information Needed]
|
| 24 |
-
- **Language(s) (NLP):** [More Information Needed]
|
| 25 |
-
- **License:** [More Information Needed]
|
| 26 |
-
- **Finetuned from model [optional]:** [More Information Needed]
|
| 27 |
|
| 28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
-
|
| 31 |
|
| 32 |
-
-
|
| 33 |
-
-
|
| 34 |
-
-
|
|
|
|
| 35 |
|
| 36 |
## Uses
|
| 37 |
|
| 38 |
-
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
| 39 |
-
|
| 40 |
### Direct Use
|
| 41 |
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
### Downstream Use [optional]
|
| 47 |
-
|
| 48 |
-
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
| 49 |
-
|
| 50 |
-
[More Information Needed]
|
| 51 |
|
| 52 |
### Out-of-Scope Use
|
| 53 |
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
|
|
|
| 57 |
|
| 58 |
## Bias, Risks, and Limitations
|
| 59 |
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
|
|
|
|
|
|
|
|
|
| 63 |
|
| 64 |
### Recommendations
|
| 65 |
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
|
|
|
|
|
|
| 69 |
|
| 70 |
## How to Get Started with the Model
|
| 71 |
|
| 72 |
-
|
| 73 |
|
| 74 |
-
[
|
| 75 |
|
| 76 |
-
|
|
|
|
|
|
|
| 77 |
|
| 78 |
-
|
|
|
|
| 79 |
|
| 80 |
-
|
|
|
|
| 81 |
|
| 82 |
-
|
|
|
|
|
|
|
|
|
|
| 83 |
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
| 87 |
-
|
| 88 |
-
#### Preprocessing [optional]
|
| 89 |
|
| 90 |
-
|
| 91 |
|
|
|
|
|
|
|
|
|
|
| 92 |
|
| 93 |
-
|
| 94 |
|
| 95 |
-
|
| 96 |
|
| 97 |
-
|
| 98 |
|
| 99 |
-
|
| 100 |
|
| 101 |
-
|
| 102 |
|
| 103 |
## Evaluation
|
| 104 |
|
| 105 |
-
<!-- This section describes the evaluation protocols and provides the results. -->
|
| 106 |
-
|
| 107 |
### Testing Data, Factors & Metrics
|
| 108 |
|
| 109 |
#### Testing Data
|
| 110 |
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
| 118 |
-
|
| 119 |
-
[More Information Needed]
|
| 120 |
|
| 121 |
#### Metrics
|
| 122 |
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
[More Information Needed]
|
| 126 |
|
| 127 |
### Results
|
| 128 |
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
## Model Examination [optional]
|
| 136 |
-
|
| 137 |
-
<!-- Relevant interpretability work for the model goes here -->
|
| 138 |
-
|
| 139 |
-
[More Information Needed]
|
| 140 |
-
|
| 141 |
-
## Environmental Impact
|
| 142 |
-
|
| 143 |
-
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
| 144 |
-
|
| 145 |
-
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
| 146 |
-
|
| 147 |
-
- **Hardware Type:** [More Information Needed]
|
| 148 |
-
- **Hours used:** [More Information Needed]
|
| 149 |
-
- **Cloud Provider:** [More Information Needed]
|
| 150 |
-
- **Compute Region:** [More Information Needed]
|
| 151 |
-
- **Carbon Emitted:** [More Information Needed]
|
| 152 |
-
|
| 153 |
-
## Technical Specifications [optional]
|
| 154 |
-
|
| 155 |
-
### Model Architecture and Objective
|
| 156 |
-
|
| 157 |
-
[More Information Needed]
|
| 158 |
-
|
| 159 |
-
### Compute Infrastructure
|
| 160 |
-
|
| 161 |
-
[More Information Needed]
|
| 162 |
-
|
| 163 |
-
#### Hardware
|
| 164 |
-
|
| 165 |
-
[More Information Needed]
|
| 166 |
-
|
| 167 |
-
#### Software
|
| 168 |
-
|
| 169 |
-
[More Information Needed]
|
| 170 |
-
|
| 171 |
-
## Citation [optional]
|
| 172 |
-
|
| 173 |
-
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
| 174 |
-
|
| 175 |
-
**BibTeX:**
|
| 176 |
-
|
| 177 |
-
[More Information Needed]
|
| 178 |
-
|
| 179 |
-
**APA:**
|
| 180 |
-
|
| 181 |
-
[More Information Needed]
|
| 182 |
-
|
| 183 |
-
## Glossary [optional]
|
| 184 |
|
| 185 |
-
|
| 186 |
|
| 187 |
-
[
|
|
|
|
|
|
|
| 188 |
|
| 189 |
-
|
| 190 |
|
| 191 |
-
|
| 192 |
|
| 193 |
-
|
|
|
|
|
|
|
|
|
|
| 194 |
|
| 195 |
-
[
|
| 196 |
|
| 197 |
-
##
|
| 198 |
|
| 199 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
library_name: transformers
|
| 3 |
+
pipeline_tag: text-generation
|
| 4 |
+
license: apache-2.0
|
| 5 |
+
language: en
|
| 6 |
+
base_model: Qwen/Qwen2-7B-Instruct
|
| 7 |
+
tags:
|
| 8 |
+
- llm
|
| 9 |
+
- agent
|
| 10 |
+
- tool-use
|
| 11 |
+
- planning
|
| 12 |
+
- qwen2
|
| 13 |
+
- reinforcement-learning
|
| 14 |
---
|
| 15 |
|
| 16 |
+
<p align="center">
|
| 17 |
+
<picture>
|
| 18 |
+
<source media="(prefers-color-scheme: dark)" srcset="https://github.com/lupantech/AgentFlow/raw/main/assets/img/logo.png">
|
| 19 |
+
<img alt="AgentFlow" src="https://github.com/lupantech/AgentFlow/raw/main/assets/img/logo.png" width=31%>
|
| 20 |
+
</picture>
|
| 21 |
+
</p>
|
| 22 |
+
|
| 23 |
+
<h3 align="center">
|
| 24 |
+
AgentFlow: In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
|
| 25 |
+
</h3>
|
| 26 |
+
|
| 27 |
+
<!--- BADGES: START --->
|
| 28 |
+
<p align="center">
|
| 29 |
+
<a href="https://arxiv.org/abs/2510.05592"><img src="https://img.shields.io/badge/arXiv-2510.05592-B31B1B.svg?logo=arxiv" alt="Arxiv"></a>
|
| 30 |
+
<a href="https://huggingface.co/spaces/AgentFlow/agentflow"><img src="https://img.shields.io/badge/Gradio-Demo-F97316.svg?logo=gradio" alt="Gradio Demo"></a>
|
| 31 |
+
<a href="https://huggingface.co/papers/2510.05592"><img src="https://img.shields.io/badge/Huggingface-Paper-FFD21E.svg?logo=huggingface" alt="Huggingface Paper"></a>
|
| 32 |
+
<a href="https://huggingface.co/AgentFlow"><img src="https://img.shields.io/badge/Huggingface-Model-FFD21E.svg?logo=huggingface" alt="Huggingface Model"></a>
|
| 33 |
+
<a href="https://agentflow.stanford.edu/"><img src="https://img.shields.io/badge/Website-AgentFlow-E5426E?logo=kashflow" alt="Website"></a>
|
| 34 |
+
</p>
|
| 35 |
+
<!--- BADGES: END --->
|
| 36 |
|
| 37 |
## Model Details
|
| 38 |
|
| 39 |
### Model Description
|
| 40 |
|
| 41 |
+
AgentFlow is a **trainable, in-the-flow agentic framework** that coordinates four specialized modules (planner, executor, verifier, generator) through an evolving memory and directly optimizes its planner inside the multi-turn loop. This system addresses the limitations of prevailing tool-augmented approaches that often scale poorly with long horizons and diverse tools, and generalize weakly to new scenarios.
|
|
|
|
|
|
|
| 42 |
|
| 43 |
+
To enable effective planning and tool use, AgentFlow introduces **Flow-based Group Refined Policy Optimization (Flow-GRPO)**, a novel algorithm that tackles long-horizon, sparse-reward credit assignment by converting multi-turn optimization into a sequence of tractable single-turn policy updates. It broadcasts a single, verifiable trajectory-level outcome to every turn to align local planner decisions with global success and stabilizes learning with group-normalized advantages. The model leverages a **Qwen-2.5-7B-Instruct backbone**.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
|
| 45 |
+
- **Developed by:** Zhuofeng Li, Haoxiang Zhang, Pan Lu, and others.
|
| 46 |
+
- **Model type:** Large Language Model with Agentic Capabilities
|
| 47 |
+
- **Language(s) (NLP):** English
|
| 48 |
+
- **License:** Apache-2.0
|
| 49 |
+
- **Finetuned from model:** Qwen/Qwen2-7B-Instruct
|
| 50 |
|
| 51 |
+
### Model Sources
|
| 52 |
|
| 53 |
+
- **Repository:** https://github.com/lupantech/AgentFlow
|
| 54 |
+
- **Paper:** https://huggingface.co/papers/2510.05592
|
| 55 |
+
- **Project Page:** https://agentflow.stanford.edu/
|
| 56 |
+
- **Demo:** https://huggingface.co/spaces/AgentFlow/agentflow
|
| 57 |
|
| 58 |
## Uses
|
| 59 |
|
|
|
|
|
|
|
| 60 |
### Direct Use
|
| 61 |
|
| 62 |
+
AgentFlow is intended for researchers and developers working on advanced AI agents and large language models that require dynamic planning and effective utilization of external tools. It is particularly suitable for:
|
| 63 |
+
* Complex reasoning tasks that demand multi-turn interaction and robust credit assignment.
|
| 64 |
+
* Developing systems capable of autonomous skill discovery and practice in live environments.
|
| 65 |
+
* Benchmarking and advancing the state-of-the-art in agentic LLM research.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
|
| 67 |
### Out-of-Scope Use
|
| 68 |
|
| 69 |
+
The model is not intended for:
|
| 70 |
+
* Deployment in high-stakes, safety-critical applications without extensive additional fine-tuning, validation, and human oversight.
|
| 71 |
+
* Generating content that is harmful, unethical, or violates privacy.
|
| 72 |
+
* Tasks outside the scope of text-based reasoning and tool use without further adaptation or integration with other modalities.
|
| 73 |
|
| 74 |
## Bias, Risks, and Limitations
|
| 75 |
|
| 76 |
+
AgentFlow, like other large language models, may exhibit biases present in its training data or the tools it integrates. Potential risks and limitations include:
|
| 77 |
+
* **Hallucination:** The model might generate factually incorrect or nonsensical outputs, especially in complex scenarios or when tool outputs are ambiguous.
|
| 78 |
+
* **Tool Misuse/Over-reliance:** Incorrectly invoking tools, misinterpreting tool outputs, or failing to identify appropriate tools for a given task.
|
| 79 |
+
* **Generalization Gaps:** While designed for generalization, performance might degrade on tasks significantly different from its training distribution.
|
| 80 |
+
* **Long-horizon Challenges:** Although designed to address long horizons, extremely long and complex tasks may still pose challenges for effective planning and execution.
|
| 81 |
+
* **API Key Dependency:** The system's functionality heavily relies on external API keys (e.g., Google, OpenAI, DashScope), which might incur costs or introduce external dependencies.
|
| 82 |
|
| 83 |
### Recommendations
|
| 84 |
|
| 85 |
+
Users of AgentFlow should:
|
| 86 |
+
* Be aware of the potential for biases and hallucinations inherited from the underlying LLM and training data.
|
| 87 |
+
* Carefully validate outputs, especially for critical applications.
|
| 88 |
+
* Thoroughly test the system's behavior in specific deployment contexts.
|
| 89 |
+
* Refer to the [AgentFlow GitHub repository](https://github.com/lupantech/AgentFlow) for detailed setup, configuration, and best practices to mitigate risks.
|
| 90 |
|
| 91 |
## How to Get Started with the Model
|
| 92 |
|
| 93 |
+
AgentFlow provides a modular agentic system with **four specialized modules** (planner, executor, verifier, generator) that coordinate through **evolving memory** and a **toolkit** over **multiple turns** to solve complex reasoning tasks.
|
| 94 |
|
| 95 |
+
To quickly experience the system in action, follow the installation and environment setup instructions on the [AgentFlow GitHub repository](https://github.com/lupantech/AgentFlow). Once your environment variables and API keys are configured, you can use the following Python code snippet for inference:
|
| 96 |
|
| 97 |
+
```python
|
| 98 |
+
# Import the solver
|
| 99 |
+
from agentflow.agentflow.solver import construct_solver
|
| 100 |
|
| 101 |
+
# Set the LLM engine name (e.g., "dashscope" or "together")
|
| 102 |
+
llm_engine_name = "dashscope"
|
| 103 |
|
| 104 |
+
# Construct the solver
|
| 105 |
+
solver = construct_solver(llm_engine_name=llm_engine_name)
|
| 106 |
|
| 107 |
+
# Solve the user query
|
| 108 |
+
output = solver.solve("What is the capital of France?")
|
| 109 |
+
print(output["direct_output"])
|
| 110 |
+
```
|
| 111 |
|
| 112 |
+
## Training Details
|
|
|
|
|
|
|
|
|
|
|
|
|
| 113 |
|
| 114 |
+
### Training Data
|
| 115 |
|
| 116 |
+
AgentFlow is trained on a mixed dataset for diverse reasoning tasks:
|
| 117 |
+
* **NQ (Natural Questions)**: Used for agentic search tasks. (Link: [https://huggingface.co/datasets/RUC-NLPIR/FlashRAG_datasets](https://huggingface.co/datasets/RUC-NLPIR/FlashRAG_datasets))
|
| 118 |
+
* **DeepMath-103K**: Used for mathematical reasoning tasks. (Link: [https://huggingface.co/datasets/zwhe99/DeepMath-103K](https://huggingface.co/datasets/zwhe99/DeepMath-103K))
|
| 119 |
|
| 120 |
+
Detailed scripts for dataset preparation (`get_train_data.py`, `aime24_data.py`) are available in the [GitHub repository](https://github.com/lupantech/AgentFlow/tree/main/data).
|
| 121 |
|
| 122 |
+
### Training Procedure
|
| 123 |
|
| 124 |
+
AgentFlow employs **Flow-based Group Refined Policy Optimization (Flow-GRPO)**, which directly optimizes the planner agent within the multi-turn interaction loop in an online fashion. This method converts multi-turn optimization into a sequence of tractable single-turn policy updates.
|
| 125 |
|
| 126 |
+
#### Training Hyperparameters
|
| 127 |
|
| 128 |
+
All training hyperparameters (model settings, tools, RL parameters, resources) are configurable via `train/config.yaml` in the GitHub repository.
|
| 129 |
|
| 130 |
## Evaluation
|
| 131 |
|
|
|
|
|
|
|
| 132 |
### Testing Data, Factors & Metrics
|
| 133 |
|
| 134 |
#### Testing Data
|
| 135 |
|
| 136 |
+
AgentFlow was evaluated across ten benchmarks covering various domains:
|
| 137 |
+
* Search tasks
|
| 138 |
+
* Agentic reasoning tasks
|
| 139 |
+
* Mathematical tasks
|
| 140 |
+
* Scientific tasks
|
|
|
|
|
|
|
|
|
|
|
|
|
| 141 |
|
| 142 |
#### Metrics
|
| 143 |
|
| 144 |
+
The primary metric for evaluation is **accuracy**.
|
|
|
|
|
|
|
| 145 |
|
| 146 |
### Results
|
| 147 |
|
| 148 |
+
AgentFlow, utilizing a 7B-scale backbone (Qwen-2.5-7B-Instruct), demonstrates significant performance gains over top-performing baselines across multiple benchmarks:
|
| 149 |
+
- **+14.9%** average accuracy gain on search tasks.
|
| 150 |
+
- **+14.0%** average accuracy gain on agentic reasoning tasks.
|
| 151 |
+
- **+14.5%** average accuracy gain on mathematical tasks.
|
| 152 |
+
- **+4.1%** average accuracy gain on scientific tasks.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 153 |
|
| 154 |
+
Notably, AgentFlow even surpassed larger proprietary models like GPT-4o on these benchmarks. Further analysis indicates improved planning, enhanced tool-calling reliability, and positive scaling trends with model size and reasoning turns.
|
| 155 |
|
| 156 |
+

|
| 157 |
+

|
| 158 |
+

|
| 159 |
|
| 160 |
+
For a more in-depth understanding of the evaluation protocols and detailed results, please refer to the [paper](https://huggingface.co/papers/2510.05592) and the [project page](https://agentflow.stanford.edu/).
|
| 161 |
|
| 162 |
+
## Acknowledgements
|
| 163 |
|
| 164 |
+
We thank the following open-source projects:
|
| 165 |
+
- [verl](https://github.com/volcengine/verl) for the excellent RL framework design.
|
| 166 |
+
- [VLLM](https://github.com/vllm-project/vllm) for fast LLM inference support.
|
| 167 |
+
- [Ver-Tool](https://github.com/TIGER-AI-Lab/verl-tool) and [agent-lightning](https://github.com/microsoft/agent-lightning) for their early-stage exploration in agentic RL Training.
|
| 168 |
|
| 169 |
+
We thank [Lambda](https://lambda.ai/careers) for GPU support!
|
| 170 |
|
| 171 |
+
## Citation
|
| 172 |
|
| 173 |
+
```bibtex
|
| 174 |
+
@article{li2025intheflow,
|
| 175 |
+
title = {In-the-Flow Agentic System Optimization for Effective Planning and Tool Use},
|
| 176 |
+
author = {Li, Zhuofeng and Zhang, Haoxiang and Han, Seungju and Liu, Sheng and Xie, Jianwen and Zhang, Yu and Choi, Yejin and Zou, James and Lu, Pan},
|
| 177 |
+
journal = {arXiv preprint arXiv:2510.05592},
|
| 178 |
+
year = {2025}
|
| 179 |
+
}
|
| 180 |
+
```
|