MiDashengLM-7B-0804 (4bit, bitsandbytes)
The bnb-4bit weights for mispeech/midashenglm-7b-0804-fp32.
Note: This is a basic 4-bit quantization using bitsandbytes. For better performance and accuracy, we recommend using our GPTQ-quantized version which maintains higher quality while still providing significant memory savings.
Usage
Load Model
from transformers import AutoModelForCausalLM, AutoProcessor, AutoTokenizer
model_id = "mispeech/midashenglm-7b-0804-4bit-bnb"  # "mispeech/midashenglm-7b-0804-w4a16-gptq" is more recommended
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
Construct Prompt
user_prompt = "Caption the audio."  # You may try any other prompt
messages = [
    {
        "role": "system",
        "content": [
            {"type": "text", "text": "You are a helpful language and speech assistant."}
        ],
    },
    {
        "role": "user",
        "content": [
            {"type": "text", "text": user_prompt},
            {
                "type": "audio",
                "path": "/path/to/example.wav",
                # or "url": "https://example.com/example.wav"
                # or "audio": np.random.randn(16000)
            },
        ],
    },
]
Generate Output
import torch
with torch.no_grad():
    model_inputs = processor.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True,
        add_special_tokens=True,
        return_dict=True,
    ).to(device=model.device, dtype=model.dtype)
    generation = model.generate(**model_inputs)
    output = tokenizer.batch_decode(generation, skip_special_tokens=True)  # ["An engine is idling."]
Citation
MiDashengLM is under the Apache License 2.0, and we encourage its use in both research and business applications.
If you find MiDashengLM useful in your research, please consider citing our work:
@techreport{midashenglm7b,
  title      = {MiDashengLM: Efficient Audio Understanding with General Audio Captions},
  author     = {{Horizon Team, MiLM Plus}},
  institution= {Xiaomi Inc.},
  year       = {2025},
  note       = {Contributors: Heinrich Dinkel et al. (listed alphabetically in Appendix B)},
  url        = {https://arxiv.org/abs/2508.03983},
  eprint     = {2508.03983},
}
- Downloads last month
 - 13
 
	Inference Providers
	NEW
	
	
	
	This model isn't deployed by any Inference Provider.
	馃檵
			
		Ask for provider support
Model tree for mispeech/midashenglm-7b-0804-4bit-bnb
Base model
Qwen/Qwen2.5-Omni-7B