soham97
/

mellow

small audio-language model

audio reasoning

audio captioning

audio question answering

Model card Files Files and versions

soham97 commited on Mar 15

Commit

d849e10

·

1 Parent(s): a04bbe7

demo

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -13,7 +13,7 @@ tags:
   - audio-text
 ---
 # Mellow: a small audio language model for reasoning
-[[`Paper`](https://arxiv.org/abs/2503.08540)] [[`GitHub`](https://github.com/soham97/Mellow)] [[`Checkpoint`](https://huggingface.co/soham97/Mellow)] [[`Zenodo`](https://zenodo.org/records/15002886)]
 Mellow is a small Audio-Language Model that takes in two audios and a text prompt as input and produces free-form text as output. It is a 167M parameter model and trained on ~155 hours of audio (AudioCaps and Clotho), and achieves SoTA performance on different tasks with 50x fewer parameters.

   - audio-text
 ---
 # Mellow: a small audio language model for reasoning
+[[`Paper`](https://arxiv.org/abs/2503.08540)] [[`GitHub`](https://github.com/soham97/Mellow)] [[`Checkpoint`](https://huggingface.co/soham97/Mellow)] [[`Zenodo`](https://zenodo.org/records/15002886)] [[`Demo`](https://tinyurl.com/mellowredirect)]
 Mellow is a small Audio-Language Model that takes in two audios and a text prompt as input and produces free-form text as output. It is a 167M parameter model and trained on ~155 hours of audio (AudioCaps and Clotho), and achieves SoTA performance on different tasks with 50x fewer parameters.