Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up

moonshotai
/
Kimi-Audio-7B-Instruct

Text-to-Speech
KimiAudio
Safetensors
English
Chinese
audio
audio-language-model
speech-recognition
audio-understanding
audio-generation
chat
custom_code
Model card Files Files and versions
xet
Community
21

Instructions to use moonshotai/Kimi-Audio-7B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

  • Libraries
  • KimiAudio

    How to use moonshotai/Kimi-Audio-7B-Instruct with KimiAudio:

    # Example usage for KimiAudio
    # pip install git+https://github.com/MoonshotAI/Kimi-Audio.git
    
    from kimia_infer.api.kimia import KimiAudio
    
    model = KimiAudio(model_path="moonshotai/Kimi-Audio-7B-Instruct", load_detokenizer=True)
    
    sampling_params = {
        "audio_temperature": 0.8,
        "audio_top_k": 10,
        "text_temperature": 0.0,
        "text_top_k": 5,
    }
    
    # For ASR
    asr_audio = "asr_example.wav"
    messages_asr = [
        {"role": "user", "message_type": "text", "content": "Please transcribe the following audio:"},
        {"role": "user", "message_type": "audio", "content": asr_audio}
    ]
    _, text = model.generate(messages_asr, **sampling_params, output_type="text")
    print(text)
    
    # For Q&A
    qa_audio = "qa_example.wav"
    messages_conv = [{"role": "user", "message_type": "audio", "content": qa_audio}]
    wav, text = model.generate(messages_conv, **sampling_params, output_type="both")
    sf.write("output_audio.wav", wav.cpu().view(-1).numpy(), 24000)
    print(text)
    
  • Notebooks
  • Google Colab
  • Kaggle
Kimi-Audio-7B-Instruct / whisper-large-v3
3.09 GB
Ctrl+K
Ctrl+K
  • 6 contributors
History: 2 commits
bigmoyan's picture
bigmoyan
Upload folder using huggingface_hub
977734a verified about 1 year ago
  • README.md
    21.8 kB
    Upload folder using huggingface_hub about 1 year ago
  • added_tokens.json
    34.6 kB
    Upload folder using huggingface_hub about 1 year ago
  • config.json
    1.27 kB
    Upload folder using huggingface_hub about 1 year ago
  • generation_config.json
    3.9 kB
    Upload folder using huggingface_hub about 1 year ago
  • merges.txt
    494 kB
    Upload folder using huggingface_hub about 1 year ago
  • model.safetensors
    3.09 GB
    xet
    Upload folder using huggingface_hub about 1 year ago
  • normalizer.json
    52.7 kB
    Upload folder using huggingface_hub about 1 year ago
  • preprocessor_config.json
    340 Bytes
    Upload folder using huggingface_hub about 1 year ago
  • special_tokens_map.json
    2.07 kB
    Upload folder using huggingface_hub about 1 year ago
  • tokenizer.json
    2.48 MB
    Upload folder using huggingface_hub about 1 year ago
  • tokenizer_config.json
    283 kB
    Upload folder using huggingface_hub about 1 year ago
  • vocab.json
    1.04 MB
    Upload folder using huggingface_hub about 1 year ago