zhichen commited on
Commit
997c873
·
verified ·
1 Parent(s): 51249e4

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -34,3 +34,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
 
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ docs/assets/config.png filter=lfs diff=lfs merge=lfs -text
38
+ docs/assets/ultravox-cn-web.png filter=lfs diff=lfs merge=lfs -text
39
+ docs/assets/UV[[:space:]]Hero[[:space:]]Image[[:space:]](1).png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,71 +1,69 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
- 中文 | [English](README_EN.md)
5
-
6
- <p align="center">
7
- <picture>
8
- <img alt="Ultravox" src="https://zfmrfvimiaqahezndsse.supabase.co/storage/v1/object/public/images/custom/Introducing%20Ultravox%20Wide.jpg">
9
- </picture>
10
- </p>
11
-
12
- <h3 align="center">
13
- 一款为实时语音交互设计的快速多模态LLM
14
- </h3>
15
-
16
-
17
- # 概述
18
-
19
- Ultravox是一种新型的多模态LLM,能够理解文本和人类语音,无需单独的自动语音识别(ASR)阶段。基于[AudioLM](https://arxiv.org/abs/2209.03143)、[SeamlessM4T](https://ai.meta.com/blog/seamless-m4t/)、[Gazelle](https://tincans.ai/slm)、[SpeechGPT](https://github.com/0nutation/SpeechGPT/tree/main/speechgpt)等研究,Ultravox能够将任何开放权重LLM扩展为一个多模态投影器,直接将音频转换为LLM使用的高维空间。
20
-
21
- ultravox官方仓库:[https://github.com/fixie-ai/ultravox](https://github.com/fixie-ai/ultravox)
22
-
23
- ultravox-cn仓库:[https://github.com/seanzhang-zhichen/ultravox-cn](https://github.com/seanzhang-zhichen/ultravox-cn)
24
-
25
- 由于官方版本模型对中文支持较差,因此,我们训练了基于Qwen2.5-7B-Instruct和whisper-large-v3-turbo的中文友好的语音多模态模型
26
-
27
- ### 架构
28
-
29
- [![架构图](https://raw.githubusercontent.com/fixie-ai/ultravox/main/docs/assets/Ultravox%20Model%20Architecture.svg)](https://docs.google.com/presentation/d/1ey81xuuMzrJaBwztb_Rq24Cit37GQokD2aAes_KkGVI/edit)
30
-
31
-
32
- ### 效果
33
-
34
- ![ultravox-cn-web](docs/assets/ultravox-cn-web.png)
35
-
36
- ### 模型
37
-
38
- - Huggingface下载地址:[https://huggingface.co/zhichen/ultravox-cn](https://huggingface.co/zhichen/ultravox-cn)
39
- - Modelscope下载地址:[https://modelscope.cn/models/seanzhang/ultravox-cn](https://modelscope.cn/models/seanzhang/ultravox-cn)
40
-
41
-
42
- ## 环境设置
43
-
44
- 安装`just`
45
-
46
- ```bash
47
- git clone https://github.com/seanzhang-zhichen/ultravox-cn.git
48
- cd ultravox-cn
49
- sudo apt-get install just
50
- conda create -n ultravox python=3.11
51
- conda activate ultravox
52
- just install
53
- ```
54
-
55
- ## 模型准备
56
-
57
- 运行demo前,需准备以下模型:
58
-
59
- - Qwen2.5-7B-Instruct
60
- - whisper-large-v3-turbo
61
- - seanzhang/ultravox-cn
62
-
63
- 以上模型准备好后,修改seanzhang/ultravox-cn/config.json中的audio_model_id为本地whisper-large-v3-turbo路径,text_model_id为本地Qwen2.5-7B-Instruct路径。
64
-
65
- ![config.json](docs/assets/config.png)
66
-
67
- ### Web Demo
68
-
69
- ```bash
70
- python ultravox/tools/gradio_demo.py --model_path seanzhang/ultravox-cn(或本地路径)
71
- ```
 
1
+ 中文 | [English](README_EN.md)
2
+
3
+ <p align="center">
4
+ <picture>
5
+ <img alt="Ultravox" src="https://zfmrfvimiaqahezndsse.supabase.co/storage/v1/object/public/images/custom/Introducing%20Ultravox%20Wide.jpg">
6
+ </picture>
7
+ </p>
8
+
9
+ <h3 align="center">
10
+ 一款为实时语音交互设计的快速多模态LLM
11
+ </h3>
12
+
13
+
14
+ # 概述
15
+
16
+ Ultravox是一种新型的多模态LLM,能够理解文本和人类语音,无需单独的自动语音识别(ASR)阶段。基于[AudioLM](https://arxiv.org/abs/2209.03143)、[SeamlessM4T](https://ai.meta.com/blog/seamless-m4t/)、[Gazelle](https://tincans.ai/slm)、[SpeechGPT](https://github.com/0nutation/SpeechGPT/tree/main/speechgpt)等研究,Ultravox能够将任何开放权重LLM扩展为一个多模态投影器,直接将音频转换为LLM使用的高维空间。
17
+
18
+ ultravox官方仓库:[https://github.com/fixie-ai/ultravox](https://github.com/fixie-ai/ultravox)
19
+
20
+ ultravox-cn仓库:[https://github.com/seanzhang-zhichen/ultravox-cn](https://github.com/seanzhang-zhichen/ultravox-cn)
21
+
22
+ 由于官方版本模型对中文支持较差,因此,我们训练了基于Qwen2.5-7B-Instruct和whisper-large-v3-turbo的中文友好的语音多模态模型
23
+
24
+ ### 架构
25
+
26
+ [![架构图](https://raw.githubusercontent.com/fixie-ai/ultravox/main/docs/assets/Ultravox%20Model%20Architecture.svg)](https://docs.google.com/presentation/d/1ey81xuuMzrJaBwztb_Rq24Cit37GQokD2aAes_KkGVI/edit)
27
+
28
+
29
+ ### 效果
30
+
31
+ ![ultravox-cn-web](docs/assets/ultravox-cn-web.png)
32
+
33
+ ### 模型
34
+
35
+ - Huggingface下载地址:[https://huggingface.co/zhichen/ultravox-cn](https://huggingface.co/zhichen/ultravox-cn)
36
+ - Modelscope下载地址:[https://modelscope.cn/models/seanzhang/ultravox-cn](https://modelscope.cn/models/seanzhang/ultravox-cn)
37
+
38
+
39
+ ## 环境设置
40
+
41
+ 安装`just`
42
+
43
+ ```bash
44
+ git clone https://github.com/seanzhang-zhichen/ultravox-cn.git
45
+ cd ultravox-cn
46
+ sudo apt-get install just
47
+ conda create -n ultravox python=3.11
48
+ conda activate ultravox
49
+ just install
50
+ ```
51
+
52
+ ## 模型准备
53
+
54
+ 运行demo前,需准备以下模型:
55
+
56
+ - Qwen2.5-7B-Instruct
57
+ - whisper-large-v3-turbo
58
+ - seanzhang/ultravox-cn
59
+
60
+ 以上模型准备好后,修改seanzhang/ultravox-cn/config.json中的audio_model_id为本地whisper-large-v3-turbo路径,text_model_id为本地Qwen2.5-7B-Instruct路径。
61
+
62
+ ![config.json](docs/assets/config.png)
63
+
64
+ ### Web Demo
65
+
66
+ ```bash
67
+ python ultravox/tools/gradio_demo.py --model_path seanzhang/ultravox-cn(或本地路径)
68
+ ```
69
+
 
 
docs/assets/Introducing Banner.svg ADDED
docs/assets/UV Hero Image (1).png ADDED

Git LFS Details

  • SHA256: fde456a0b66fa1179d8263cd8cdd9bb6d724f41320b8eb986ee402a505106462
  • Pointer size: 131 Bytes
  • Size of remote file: 292 kB
docs/assets/UV logo black.svg ADDED
docs/assets/UV logo color dark.svg ADDED
docs/assets/UV logo color light.svg ADDED
docs/assets/UV logo white.svg ADDED
docs/assets/UV stacked Black.svg ADDED
docs/assets/UV stacked color dark.svg ADDED
docs/assets/UV stacked color light.svg ADDED
docs/assets/UV stacked white.svg ADDED
docs/assets/Ultravox Model Architecture.svg ADDED
docs/assets/config.png ADDED

Git LFS Details

  • SHA256: 66a1cd18fb46aec50ab180226c603b0be3630b831275acd2cd3e7c243ec6b2f9
  • Pointer size: 131 Bytes
  • Size of remote file: 348 kB
docs/assets/ultravox-cn-web.png ADDED

Git LFS Details

  • SHA256: 3f5032a7982c3ea33096c600b5d8178add5f8fd8e53d9059cfbe5b9c336b237a
  • Pointer size: 131 Bytes
  • Size of remote file: 229 kB