AutoDeco
Official Implementation of "The End of Manual Decoding: Towards Truly End-to-End Language Models"
AutoDeco is a framework that adds token-level adaptive decoding parameter prediction capabilities to Large Language Models (LLMs). By adding lightweight prediction heads on top of pre-trained models, AutoDeco can dynamically predict optimal temperature and top-p parameters for each token during decoding.
🎯 Key Features
- Token-Level Decoding Parameter Prediction: Dynamically predict decoding parameters (temperature and top-p) for each generated token
- Lightweight Design: Only adds two small MLP prediction heads (~5MB), without modifying the base model
- Universal Architecture: Supports multiple mainstream LLM architectures (Llama, Qwen2/2.5, Qwen3, MoE models, etc.)
- End-to-End Training: Complete training with implicit gradient backpropagation through cross-entropy loss only
- Flexible Training: Supports independent training of temperature head, top-p head, or joint training
- Efficient Deployment: Only saves AutoDeco prediction head weights during training, merges with base model during decoding.
🏗️ Architecture
The AutoDeco framework consists of two core components:
Model Workflow
Input Tokens
↓
Base LLM (frozen during head training)
↓
Hidden States
├──→ LM Head → Logits
├──→ TempHead → Temperature
└──→ TopPHead → Top-P
During training, the base LLM parameters are frozen, and only the two prediction heads are trained.
🤖 Supported Models
AutoDeco supports all current autoregressive LLMs, and we unified them with the following model architectures AutoDecoModelForCausalLM interface.
| Base Model | #Base Params | #AutoDeco Params | Download |
|---|---|---|---|
| Llama-3.1-Nemotron-Nano-8B-v1 | 8B | 2.1M | 🤗 HuggingFace |
| DeepSeek-R1-Distill-Qwen-7B | 7B | 1.84M | 🤗 HuggingFace |
| Qwen3-30B-A3B-Instruct-2507 | 30B | 1.05M | 🤗 HuggingFace |
| OpenAI-GPT-OSS-20B | 20B | 1.48M | 🤗 HuggingFace |
| OpenAI-GPT-OSS-120B | 120B | 1.48M | 🤗 HuggingFace |
| Qwen3-235B-A22B-Thinking | 235B | 2.1M | 🤗 HuggingFace |
| DeepSeek-V3.1-Terminus | 671B | - | Comming Soon |
🚀 Installation
Recommended Requirements
- Python >= 3.10
- PyTorch >= 2.0
- CUDA >= 12.0 (recommended for training)
Install Dependencies
# Clone repository
cd AutoDeco
# Install core dependencies
pip install -r requirements.txt
# Optional: for training monitoring
pip install wandb
💡 Quick Start
Initialize AutoDeco Model
python script/construct_autodeco.py \
--base_model_name_or_path path_to_your_base_LLM \
--output_dir path_to_your_AutoDeco_model
🔥 Training
Prepare Training Data
Training data should be in JSONL format, with one sample per line. AutoDeco supports standard conversation format:
{
"prompt": "formatted prompt text",
"completion": "expected completion"
}
# example
{
"prompt": "<|im_start|>user\nEvaluate the limit:$$\\lim_{(x, y) \\to (1, 2)} \\frac{(x-1)(y-2)-x+3}{x^2-2x+y^2-4}$$\nMake sure you output the final answer within \\boxed{}<|im_end|>\n< im_start>assistant\n",
"completion": "......### ✅ Final Answer:\n$$\n\\boxed{-1}\n$$""
}
Train AutoDeco Heads
Use the provided training script:
# Edit script/trl_train.sh to configure parameters
# Key parameters:
# - MODEL_NAME_OR_PATH: Your initialized AutoDeco Model Path
# - DATA_NAME: Training data filename (in data directory)
# - MAX_LENGTH: Maximum sequence length
# - train_temp: Whether to train temperature head
# - train_top_p: Whether to train top-p head
bash script/trl_train.sh
Training configuration examples:
# Train only temperature head
accelerate launch trl_train.py \
--model_name_or_path AutoDeco-Llama-3.1-8B \
--dataset_name train_data.jsonl \
--train_temp true \
--train_top_p false \
--learning_rate 5e-6 \
--num_train_epochs 1 \
--output_dir ckpt/llama3_temp_head
📊 Inference
Batch Evaluation with vLLM
# Single evaluation
python llm_eval.py \
--model_name_or_path ckpt/autodeco_model \
--dataset aime24 \
--temp 1.0 \
--top_p 1.0 \
--k 16 \
--seed 42
# Batch evaluation with script (automatically generates multiple random seeds)
bash script/test_generation.sh aime24 1.0 1.0 -1 1.0 path/to/model
Evaluation results are saved in the generation_log/ directory, including:
- Pass@K metrics
- Average accuracy
- Detailed generation results for each sample
Deploy with vLLM
# example
vllm serve
📁 Project Structure
AutoDeco/
├── model/ # Model definitions
│ ├── templlm_auto.py # Unified AutoDeco model (recommended)
definitions
│
├── trainer/ # Trainers
│ └── trl_Temp.py # AutoDeco trainer
│
├── script/ # Scripts
│ ├── trl_train.sh # Training launch script
│ ├── test_generation.sh # Batch evaluation script
│ └── merge_autodeco.py # Merge or split heads
│
├── config/ # Configuration files
│ └── deepspeed/ # DeepSpeed configuration
│ └── deepspeed_zero3_gradaccu4.yaml
│
├── trl_train.py # Training main program
├── llm_eval.py # Evaluation main program (vLLM)
├── boxed_extract.py # Answer extraction tool
├── requirements.txt # requirements
└── README.md # This document
🔧 Advanced Usage
1. Extract AutoDeco Heads from AutoDeco Model
python merge_autodeco.py split \
--full-checkpoint path_to_your_full_model \
--output path_to_split_head
This generates a lightweight checkpoint (~5MB) containing:
config.json: AutoDeco configuration (including base_model_name_or_path)autodeco_heads.safetensors: Heads weights
2. Merge AutoDeco Heads to Base Model (for vLLM Deployment)
If you need to create a complete model file with heads for inference engines like vLLM:
python merge_autodeco.py merge \
--autodeco-path path_to_autodeco_heads \
--base-model-path path_to_base_LLM \
--output path_to_your_full_model
📝 Citation
If you use AutoDeco in your research, please cite:
@misc{wang2025endmanualdecodingtruly,
title={The End of Manual Decoding: Towards Truly End-to-End Language Models},
author={Zhichao Wang and Dongyang Ma and Xinting Huang and Deng Cai and Tian Lan and Jiahao Xu and Haitao Mi and Xiaoying Tang and Yan Wang},
year={2025},
eprint={2510.26697},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2510.26697},
}
- Downloads last month
- 20
Model tree for zacks917/AutoDeco-R1-Distill-Qwen-7B
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B