nvidia
/

audio-flamingo-3-chat

audio understanding

Model card Files Files and versions

SreyanG-NVIDIA commited on Jul 11

Commit

96bb163

·

verified ·

1 Parent(s): 3492ee0

Update README.md

Files changed (1) hide show

README.md +11 -0

README.md CHANGED Viewed

@@ -1,4 +1,5 @@
 ---
 license: other
 language:
 - en
@@ -9,6 +10,11 @@ tags:
 - ASR
 - chat
 - voice
 ---
 # Model Overview
@@ -68,10 +74,15 @@ Extensive evaluations confirm AF3’s effectiveness, setting new benchmarks on o
 **This model is for non-commercial research purposes only.**
 <center><img src="static/af3_radial-1.png" width="400"></center>
 <br>
 <center><img src="static/af3_main_diagram-1.png" width="800"></center>

 ---
+arxiv: 2503.03983
 license: other
 language:
 - en
 - ASR
 - chat
 - voice
+datasets:
+- nvidia/LongAudio
+- nvidia/AudioSkills
+- nvidia/AF-Think
+- nvidia/AF-Chat
 ---
 # Model Overview
 **This model is for non-commercial research purposes only.**
+## Results:
 <center><img src="static/af3_radial-1.png" width="400"></center>
 <br>
+## Model Architecture:
+Audio Flamingo 3 uses AF-Whisper unified audio encoder, MLP-based audio adaptor, Decoder-only LLM backbone (Qwen2.5-7B), and Streaming TTS module (AF3-Chat). Audio Flamingo 3 can take up to 10 minutes of audio inputs.
 <center><img src="static/af3_main_diagram-1.png" width="800"></center>