sglang-EAGLE3-Llama-4-Scout-17B-16E-Instruct-v1

Model Introduction

The Eagle3 draft model was trained using the SpecForge framework for the Llama4 Scout 17B-16E Instruct model, leveraging a combination of UltraChat and ShareGPT datasets. Under a 3-1-4 speculative decoding configuration—3 speculative steps, top-1 token selection, and 4 draft tokens—it achieves an acceptance length of 2.27.

Usage

You can use this Eagle3 draft model in SGLang with the following command:

python3 -m sglang.launch_server \
    --model meta-llama/Llama-4-Scout-17B-16E-Instruct  \
    --speculative-algorithm EAGLE3 \
    --speculative-draft-model-path lmsys/sglang-EAGLE3-Llama-4-Scout-17B-16E-Instruct-v1 \
    --speculative-num-steps 3 \
    --speculative-eagle-topk 1 \
    --speculative-num-draft-tokens 4 \
    --mem-fraction-static 0.75 \
    --cuda-graph-max-bs 2 \
    --tp 8 \
    --context-length 8192 \
    --trust-remote-code \
    --host 0.0.0.0 \
    --port 30000 \
    --dtype bfloat16
Downloads last month
382
Safetensors
Model size
1.88B params
Tensor type
I64
·
F32
·
BOOL
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including lmsys/sglang-EAGLE3-Llama-4-Scout-17B-16E-Instruct-v1