Instructions to use inclusionAI/DR-Venus-4B-RL-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use inclusionAI/DR-Venus-4B-RL-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="inclusionAI/DR-Venus-4B-RL-GGUF", filename="DR-Venus-4B-RL.F16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use inclusionAI/DR-Venus-4B-RL-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf inclusionAI/DR-Venus-4B-RL-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf inclusionAI/DR-Venus-4B-RL-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf inclusionAI/DR-Venus-4B-RL-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf inclusionAI/DR-Venus-4B-RL-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf inclusionAI/DR-Venus-4B-RL-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf inclusionAI/DR-Venus-4B-RL-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf inclusionAI/DR-Venus-4B-RL-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf inclusionAI/DR-Venus-4B-RL-GGUF:Q4_K_M
Use Docker
docker model run hf.co/inclusionAI/DR-Venus-4B-RL-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use inclusionAI/DR-Venus-4B-RL-GGUF with Ollama:
ollama run hf.co/inclusionAI/DR-Venus-4B-RL-GGUF:Q4_K_M
- Unsloth Studio new
How to use inclusionAI/DR-Venus-4B-RL-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for inclusionAI/DR-Venus-4B-RL-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for inclusionAI/DR-Venus-4B-RL-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for inclusionAI/DR-Venus-4B-RL-GGUF to start chatting
- Pi new
How to use inclusionAI/DR-Venus-4B-RL-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf inclusionAI/DR-Venus-4B-RL-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "inclusionAI/DR-Venus-4B-RL-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use inclusionAI/DR-Venus-4B-RL-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf inclusionAI/DR-Venus-4B-RL-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default inclusionAI/DR-Venus-4B-RL-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use inclusionAI/DR-Venus-4B-RL-GGUF with Docker Model Runner:
docker model run hf.co/inclusionAI/DR-Venus-4B-RL-GGUF:Q4_K_M
- Lemonade
How to use inclusionAI/DR-Venus-4B-RL-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull inclusionAI/DR-Venus-4B-RL-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.DR-Venus-4B-RL-GGUF-Q4_K_M
List all available models
lemonade list
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
DR-Venus-4B-RL-GGUF
DR-Venus-4B-RL-GGUF is the reinforcement-learned DR-Venus checkpoint built on top of inclusionAI/DR-Venus-4B-SFT. It is a 4B deep research agent designed for long-horizon web research with explicit tool use, evidence collection, and answer generation.
This model is trained entirely on open data. Starting from the SFT checkpoint, DR-Venus-4B-RL applies long-horizon agentic RL with IGPO-style information gain rewards and format-aware turn-level supervision to improve execution reliability under long tool-use trajectories.
What This Model Is For
This checkpoint is intended for:
long-horizon deep research with tool-augmented reasoning
improving execution reliability beyond supervised imitation
evidence-grounded answering with
searchandvisitdeployment in the official DR-Venus inference pipeline s It is not primarily optimized for:
plain chat without tools
generic short-context instruction following
use cases that do not need multi-step retrieval and browsing
Model Details
- Base model: Qwen/Qwen3-4B-Thinking-2507
- Initialization checkpoint: inclusionAI/DR-Venus-4B-SFT
- Training stage: agentic reinforcement learning
- Training framework:
verl+ IGPO algorithm - Tool setting:
search+visit - Maximum rollout horizon:
200interaction steps - Maximum rollout context length:
256K - Intended domain: long-horizon open-domain research and evidence-grounded question answering
How DR-Venus Builds RL Supervision
DR-Venus-4B-RL is trained with dense turn-level supervision tailored to deep research:
- The model starts from the DR-Venus supervised checkpoint.
- For each query, the agent interacts with the environment over multi-turn
searchandvisittrajectories. - IGPO uses information gain rewards to measure whether an intermediate turn increases the model's probability of producing the ground-truth answer.
- Information gain rewards are combined with outcome rewards and turn-level format-aware penalties.
- The policy is optimized using an IGPO objective with fine-grained credit assignment, specifically tailored for the long-horizon nature of deep research rollouts.
This design improves supervision density, credit assignment, and data efficiency compared with sparse trajectory-level RL alone.
Training Data
This model is trained from open-data supervision constructed from:
- the DR-Venus SFT checkpoint as initialization
- REDSearcher 1K RL query-answer pairs
- online rollouts with the DR-Venus
search+visittool environment
In the current paper setup:
- RL is performed entirely on open query-answer pairs
- rollout groups are sampled with long-horizon agent interaction
- generation is performed with up to
200interaction steps per query
For more implementation details, please refer to the DR-Venus GitHub repository.
Training Recipe
The RL checkpoint is trained with the following setup reported in the current paper draft:
- algorithm: IGPO-style agentic RL
- rollout group size:
8 - training batch size:
16 - learning rate:
1e-6 - rollout temperature:
1.0 - rollout top-p:
0.95 - maximum context length:
256K - maximum generation length per turn:
8,192 - discount factor:
0.95 - format penalty scale:
1.0 - training framework:
verlwith vLLM rollout engine and FSDP trainer
The current paper configuration also enables browse-aware IG assignment and IG-scale style reward balancing.
Evaluation Summary
DR-Venus-4B-RL improves over the SFT checkpoint on most tracked deep research benchmarks and sets a stronger small-model frontier.
Results Against Open Models Under 9B
| Model | BrowseComp | BrowseComp-ZH | GAIA (Text-Only) | xBench-DS-2505 | xBench-DS-2510 | DeepSearchQA |
|---|---|---|---|---|---|---|
| DeepDive-9B-SFT | 5.6 | 15.7 | -- | 35.0 | -- | -- |
| DeepDive-9B-RL | 6.3 | 15.1 | -- | 38.0 | -- | -- |
| WebSailor-7B | 6.7 | 14.2 | 37.9 | 34.3 | -- | -- |
| OffSeeker-8B-SFT | 10.6 | 24.2 | 47.6 | 48.0 | -- | -- |
| OffSeeker-8B-DPO | 12.8 | 26.6 | 51.5 | 49.0 | -- | -- |
| WebExplorer-8B-RL | 15.7 | 32.0 | 50.0 | 53.7 | 23.0 | 17.8 |
| AgentCPM-Explore-4B | 24.1 | 29.1 | 63.9 | 70.0 | 34.0 | 32.8 |
| DR-Venus-4B-SFT | 26.8 | 35.7 | 65.4 | 69.0 | 35.3 | 37.7 |
| DR-Venus-4B-RL | 29.1 | 37.7 | 64.4 | 74.7 | 40.7 | 39.6 |
Relative to the SFT checkpoint, DR-Venus-4B-RL improves:
- BrowseComp by
+2.3 - BrowseComp-ZH by
+2.0 - xBench-DS-2505 by
+5.7 - xBench-DS-2510 by
+5.4 - DeepSearchQA by
+1.9
These gains are associated with better formatting accuracy, more reliable tool use, and stronger long-horizon execution stability.
Usage
This checkpoint should be used with the official DR-Venus inference pipeline.
git clone https://github.com/inclusionAI/DR-Venus
cd DR-Venus/Inference
pip install -r requirements.txt
# then configure the model path in run_demo.sh or run_web_demo.sh
bash run_demo.sh
For reproducing RL training or understanding the rollout setup, see the RL directory in the official repository.
License and Release Notes
Please verify license compatibility with:
- the upstream base model
- the released supervision data
- the external tools and judge models used in training or evaluation
This section can be updated later with the final project-specific license statement.
Citation
If you use this checkpoint, please cite the DR-Venus project.
@article{venus2026drvenus,
title={DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data},
author={Venus Team and Dai, Sunhao and Deng, Yong and Lin, Jinzhen and Song, Yusheng and Wang, Guoqing and Wu, Xiaofeng and Zhou, Yuqi and Yang, Shuo and Ying, Zhenzhe and Zhang, Zhanwei and Meng, Changhua and Wang, Weiqiang},
journal={arXiv preprint arXiv:2604.19859},
year={2026}
}
Links
- GitHub: https://github.com/inclusionAI/DR-Venus
- RL code: https://github.com/inclusionAI/DR-Venus/tree/master/RL
- Inference code: https://github.com/inclusionAI/DR-Venus/tree/master/Inference
- SFT model: https://huggingface.co/inclusionAI/DR-Venus-4B-SFT
- RL model: https://huggingface.co/inclusionAI/DR-Venus-4B-RL
- Collection: https://huggingface.co/collections/inclusionAI/dr-venus
- Downloads last month
- 1,202
3-bit
4-bit
5-bit
6-bit
16-bit