DeepSeek-V3 architecture with 4 layers + 8 experts per MoE + MTP module + BF16 weights minimally trained with 50k samples generated from Mistral
To be used in CI testing
- Downloads last month
- 4
	Inference Providers
	NEW
	
	
	This model isn't deployed by any Inference Provider.
	🙋
			
		Ask for provider support