Upload folder using huggingface_hub
Browse files- .gitattributes +3 -0
- README.md +69 -71
- docs/assets/Introducing Banner.svg +0 -0
- docs/assets/UV Hero Image (1).png +3 -0
- docs/assets/UV logo black.svg +26 -0
- docs/assets/UV logo color dark.svg +57 -0
- docs/assets/UV logo color light.svg +47 -0
- docs/assets/UV logo white.svg +26 -0
- docs/assets/UV stacked Black.svg +26 -0
- docs/assets/UV stacked color dark.svg +57 -0
- docs/assets/UV stacked color light.svg +52 -0
- docs/assets/UV stacked white.svg +26 -0
- docs/assets/Ultravox Model Architecture.svg +0 -0
- docs/assets/config.png +3 -0
- docs/assets/ultravox-cn-web.png +3 -0
.gitattributes
CHANGED
|
@@ -34,3 +34,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
docs/assets/config.png filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
docs/assets/ultravox-cn-web.png filter=lfs diff=lfs merge=lfs -text
|
| 39 |
+
docs/assets/UV[[:space:]]Hero[[:space:]]Image[[:space:]](1).png filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -1,71 +1,69 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
python ultravox/tools/gradio_demo.py --model_path seanzhang/ultravox-cn(或本地路径)
|
| 71 |
-
```
|
|
|
|
| 1 |
+
中文 | [English](README_EN.md)
|
| 2 |
+
|
| 3 |
+
<p align="center">
|
| 4 |
+
<picture>
|
| 5 |
+
<img alt="Ultravox" src="https://zfmrfvimiaqahezndsse.supabase.co/storage/v1/object/public/images/custom/Introducing%20Ultravox%20Wide.jpg">
|
| 6 |
+
</picture>
|
| 7 |
+
</p>
|
| 8 |
+
|
| 9 |
+
<h3 align="center">
|
| 10 |
+
一款为实时语音交互设计的快速多模态LLM
|
| 11 |
+
</h3>
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
# 概述
|
| 15 |
+
|
| 16 |
+
Ultravox是一种新型的多模态LLM,能够理解文本和人类语音,无需单独的自动语音识别(ASR)阶段。基于[AudioLM](https://arxiv.org/abs/2209.03143)、[SeamlessM4T](https://ai.meta.com/blog/seamless-m4t/)、[Gazelle](https://tincans.ai/slm)、[SpeechGPT](https://github.com/0nutation/SpeechGPT/tree/main/speechgpt)等研究,Ultravox能够将任何开放权重LLM扩展为一个多模态投影器,直接将音频转换为LLM使用的高维空间。
|
| 17 |
+
|
| 18 |
+
ultravox官方仓库:[https://github.com/fixie-ai/ultravox](https://github.com/fixie-ai/ultravox)
|
| 19 |
+
|
| 20 |
+
ultravox-cn仓库:[https://github.com/seanzhang-zhichen/ultravox-cn](https://github.com/seanzhang-zhichen/ultravox-cn)
|
| 21 |
+
|
| 22 |
+
由于官方版本模型对中文支持较差,因此,我们训练了基于Qwen2.5-7B-Instruct和whisper-large-v3-turbo的中文友好的语音多模态模型
|
| 23 |
+
|
| 24 |
+
### 架构
|
| 25 |
+
|
| 26 |
+
[](https://docs.google.com/presentation/d/1ey81xuuMzrJaBwztb_Rq24Cit37GQokD2aAes_KkGVI/edit)
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
### 效果
|
| 30 |
+
|
| 31 |
+

|
| 32 |
+
|
| 33 |
+
### 模型
|
| 34 |
+
|
| 35 |
+
- Huggingface下载地址:[https://huggingface.co/zhichen/ultravox-cn](https://huggingface.co/zhichen/ultravox-cn)
|
| 36 |
+
- Modelscope下载地址:[https://modelscope.cn/models/seanzhang/ultravox-cn](https://modelscope.cn/models/seanzhang/ultravox-cn)
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
## 环境设置
|
| 40 |
+
|
| 41 |
+
安装`just`
|
| 42 |
+
|
| 43 |
+
```bash
|
| 44 |
+
git clone https://github.com/seanzhang-zhichen/ultravox-cn.git
|
| 45 |
+
cd ultravox-cn
|
| 46 |
+
sudo apt-get install just
|
| 47 |
+
conda create -n ultravox python=3.11
|
| 48 |
+
conda activate ultravox
|
| 49 |
+
just install
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
## 模型准备
|
| 53 |
+
|
| 54 |
+
运行demo前,需准备以下模型:
|
| 55 |
+
|
| 56 |
+
- Qwen2.5-7B-Instruct
|
| 57 |
+
- whisper-large-v3-turbo
|
| 58 |
+
- seanzhang/ultravox-cn
|
| 59 |
+
|
| 60 |
+
以上模型准备好后,修改seanzhang/ultravox-cn/config.json中的audio_model_id为本地whisper-large-v3-turbo路径,text_model_id为本地Qwen2.5-7B-Instruct路径。
|
| 61 |
+
|
| 62 |
+

|
| 63 |
+
|
| 64 |
+
### Web Demo
|
| 65 |
+
|
| 66 |
+
```bash
|
| 67 |
+
python ultravox/tools/gradio_demo.py --model_path seanzhang/ultravox-cn(或本地路径)
|
| 68 |
+
```
|
| 69 |
+
|
|
|
|
|
|
docs/assets/Introducing Banner.svg
ADDED
|
|
docs/assets/UV Hero Image (1).png
ADDED
|
Git LFS Details
|
docs/assets/UV logo black.svg
ADDED
|
|
docs/assets/UV logo color dark.svg
ADDED
|
|
docs/assets/UV logo color light.svg
ADDED
|
|
docs/assets/UV logo white.svg
ADDED
|
|
docs/assets/UV stacked Black.svg
ADDED
|
|
docs/assets/UV stacked color dark.svg
ADDED
|
|
docs/assets/UV stacked color light.svg
ADDED
|
|
docs/assets/UV stacked white.svg
ADDED
|
|
docs/assets/Ultravox Model Architecture.svg
ADDED
|
|
docs/assets/config.png
ADDED
|
Git LFS Details
|
docs/assets/ultravox-cn-web.png
ADDED
|
Git LFS Details
|