apolloparty
/

Devstral-Small-2507-NVFP4A16

8-bit precision

compressed-tensors

Model card Files Files and versions

apolloparty commited on Jul 15

Commit

4399f49

·

verified ·

1 Parent(s): c568b54

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -38,6 +38,7 @@ pipeline_tag: text2text-generation
 # Quantization NVFP4A16
 Quantified from https://huggingface.co/unsloth/Devstral-Small-2507 (due to in-folder tokenizer).
 Compressed with [llm-compressor](https://github.com/vllm-project/llm-compressor).
 We recommend cuda capabilities 12.0 hardware (NVIDIA Blackwell: RTX 5000 series GPU, DGX Spark, B200, ...) due to FP4 native acceleration.
 # Devstral Small 1.1

 # Quantization NVFP4A16
 Quantified from https://huggingface.co/unsloth/Devstral-Small-2507 (due to in-folder tokenizer).
 Compressed with [llm-compressor](https://github.com/vllm-project/llm-compressor).
 We recommend cuda capabilities 12.0 hardware (NVIDIA Blackwell: RTX 5000 series GPU, DGX Spark, B200, ...) due to FP4 native acceleration.
 # Devstral Small 1.1