YAML Metadata
		Warning:
	empty or missing yaml metadata in repo card
	(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Mixture of Attentions for Speculative Decoding
This checkpoint was obtained from "Mixture of Attentions For Speculative Decoding" by Matthieu Zimmer*, Milan Gritta*, Gerasimos Lampouras, Haitham Bou Ammar, and Jun Wang. The paper introduces a novel architecture for speculative decoding that enhances the speed of large language model (LLM) inference.
It is supported in vLLM see our Github repository.
Checkpoints
| Base Model | MOA Spec on Hugging Face | Base Model Parameters | MOA Spec Parameters | 
|---|---|---|---|
| meta-llama/Meta-Llama-3-8B-Instruct | huawei-noah/MOASpec-Llama-3-8B-Instruct | 8B | 0.25B | 
Citation
If you use this code or this checkpoint in your research, please cite our paper:
@misc{zimmer2024mixtureattentionsspeculativedecoding,
      title={Mixture of Attentions For Speculative Decoding}, 
      author={Matthieu Zimmer and Milan Gritta and Gerasimos Lampouras and Haitham Bou Ammar and Jun Wang},
      year={2024},
      eprint={2410.03804},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.03804}, 
}
License
This project is licensed under the MIT License. See the LICENSE file for more details.
Disclaimer: This open source project is not an official Huawei product, Huawei is not expected to provide support for this project.
- Downloads last month
 - 2
 
	Inference Providers
	NEW
	
	
	This model isn't deployed by any Inference Provider.
	๐
			
		Ask for provider support