SreyanG-NVIDIA commited on
Commit
96bb163
·
verified ·
1 Parent(s): 3492ee0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -0
README.md CHANGED
@@ -1,4 +1,5 @@
1
  ---
 
2
  license: other
3
  language:
4
  - en
@@ -9,6 +10,11 @@ tags:
9
  - ASR
10
  - chat
11
  - voice
 
 
 
 
 
12
  ---
13
  # Model Overview
14
 
@@ -68,10 +74,15 @@ Extensive evaluations confirm AF3’s effectiveness, setting new benchmarks on o
68
 
69
  **This model is for non-commercial research purposes only.**
70
 
 
 
71
  <center><img src="static/af3_radial-1.png" width="400"></center>
72
 
73
  <br>
74
 
 
 
 
75
  <center><img src="static/af3_main_diagram-1.png" width="800"></center>
76
 
77
 
 
1
  ---
2
+ arxiv: 2503.03983
3
  license: other
4
  language:
5
  - en
 
10
  - ASR
11
  - chat
12
  - voice
13
+ datasets:
14
+ - nvidia/LongAudio
15
+ - nvidia/AudioSkills
16
+ - nvidia/AF-Think
17
+ - nvidia/AF-Chat
18
  ---
19
  # Model Overview
20
 
 
74
 
75
  **This model is for non-commercial research purposes only.**
76
 
77
+
78
+ ## Results:
79
  <center><img src="static/af3_radial-1.png" width="400"></center>
80
 
81
  <br>
82
 
83
+ ## Model Architecture:
84
+ Audio Flamingo 3 uses AF-Whisper unified audio encoder, MLP-based audio adaptor, Decoder-only LLM backbone (Qwen2.5-7B), and Streaming TTS module (AF3-Chat). Audio Flamingo 3 can take up to 10 minutes of audio inputs.
85
+
86
  <center><img src="static/af3_main_diagram-1.png" width="800"></center>
87
 
88