Files changed (1) hide show
  1. README.md +10 -9
README.md CHANGED
@@ -1,5 +1,4 @@
1
  ---
2
- library_name: mistral-common
3
  language:
4
  - en
5
  - fr
@@ -10,18 +9,20 @@ language:
10
  - nl
11
  - hi
12
  license: apache-2.0
 
13
  inference: false
14
  extra_gated_description: >-
15
  If you want to learn more about how we process your personal data, please read
16
  our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
 
17
  tags:
18
- - vllm
19
  ---
20
  # Voxtral Mini 1.0 (3B) - 2507
21
 
22
  Voxtral Mini is an enhancement of [Ministral 3B](https://mistral.ai/news/ministraux), incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding.
23
 
24
- Learn more about Voxtral in our blog post [here](https://mistral.ai/news/voxtral) and our [research paper](https://arxiv.org/abs/2507.13264).
25
 
26
  ## Key Features
27
 
@@ -63,10 +64,10 @@ We recommend using this model with [vLLM](https://github.com/vllm-project/vllm).
63
 
64
  #### Installation
65
 
66
- Make sure to install vllm >= 0.10.0, we recommend using `uv`:
67
 
68
  ```
69
- uv pip install -U "vllm[audio]" --system
70
  ```
71
 
72
  Doing so should automatically install [`mistral_common >= 1.8.1`](https://github.com/mistralai/mistral-common/releases/tag/v1.8.1).
@@ -241,11 +242,11 @@ print(response)
241
 
242
  ### Transformers 🤗
243
 
244
- Starting with `transformers >= 4.54.0` and above, you can run Voxtral natively!
245
 
246
- Install Transformers:
247
  ```bash
248
- pip install -U transformers
249
  ```
250
 
251
  Make sure to have `mistral-common >= 1.8.1` installed with audio dependencies:
@@ -511,7 +512,7 @@ repo_id = "mistralai/Voxtral-Mini-3B-2507"
511
  processor = AutoProcessor.from_pretrained(repo_id)
512
  model = VoxtralForConditionalGeneration.from_pretrained(repo_id, torch_dtype=torch.bfloat16, device_map=device)
513
 
514
- inputs = processor.apply_transcription_request(language="en", audio="https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/obama.mp3", model_id=repo_id)
515
  inputs = inputs.to(device, dtype=torch.bfloat16)
516
 
517
  outputs = model.generate(**inputs, max_new_tokens=500)
 
1
  ---
 
2
  language:
3
  - en
4
  - fr
 
9
  - nl
10
  - hi
11
  license: apache-2.0
12
+ library_name: vllm
13
  inference: false
14
  extra_gated_description: >-
15
  If you want to learn more about how we process your personal data, please read
16
  our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
17
+ pipeline_tag: audio-text-to-text
18
  tags:
19
+ - transformers
20
  ---
21
  # Voxtral Mini 1.0 (3B) - 2507
22
 
23
  Voxtral Mini is an enhancement of [Ministral 3B](https://mistral.ai/news/ministraux), incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding.
24
 
25
+ Learn more about Voxtral in our blog post [here](https://mistral.ai/news/voxtral).
26
 
27
  ## Key Features
28
 
 
64
 
65
  #### Installation
66
 
67
+ Make sure to install vllm from "main", we recommend using `uv`:
68
 
69
  ```
70
+ uv pip install -U "vllm[audio]" --torch-backend=auto --extra-index-url https://wheels.vllm.ai/nightly
71
  ```
72
 
73
  Doing so should automatically install [`mistral_common >= 1.8.1`](https://github.com/mistralai/mistral-common/releases/tag/v1.8.1).
 
242
 
243
  ### Transformers 🤗
244
 
245
+ Voxtral is supported in Transformers natively!
246
 
247
+ Install Transformers from source:
248
  ```bash
249
+ pip install git+https://github.com/huggingface/transformers
250
  ```
251
 
252
  Make sure to have `mistral-common >= 1.8.1` installed with audio dependencies:
 
512
  processor = AutoProcessor.from_pretrained(repo_id)
513
  model = VoxtralForConditionalGeneration.from_pretrained(repo_id, torch_dtype=torch.bfloat16, device_map=device)
514
 
515
+ inputs = processor.apply_transcrition_request(language="en", audio="https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/obama.mp3", model_id=repo_id)
516
  inputs = inputs.to(device, dtype=torch.bfloat16)
517
 
518
  outputs = model.generate(**inputs, max_new_tokens=500)