Speculators
					Collection
				
				6 items
				โข 
				Updated
					
				โข
					
					2
Build a fastest OSS vllm-based speculative decoding system for your own model, using ArcticTraining and ArcticInference!
We compare the throughput (tokens/s) of existing vllm-based speculative decoding systems for Llama3.1-70B-Instruct on 8xH100 as below:
| method | ShareGPT | HumanEval | 
|---|---|---|
| VLLM V1 Baseline | 84.1 | 84.1 | 
| VLLM V1 Eagle | 102.2 | 112.0 | 
| VLLM V1 Eagle3 | 77.7 | 85.3 | 
| VLLM V0 MLP-Speculator (IBM) | 77.9 | 66.7 | 
| ArcticSpeculator | 172.4 | 203.7 | 
For more details about ArcticSpeculator and how to use it:
We also release ArcticSpeculator checkpoints we trained with ArcticTraining to run with ArcticInference: