soham97
/

mellow

small audio-language model

audio reasoning

audio captioning

audio question answering

Model card Files Files and versions

soham97 commited on Mar 12

Commit

a04bbe7

·

1 Parent(s): 1cf68be

update

Files changed (1) hide show

README.md +11 -3

README.md CHANGED Viewed

@@ -12,8 +12,8 @@ tags:
   - zero-shot
   - audio-text
 ---
-# Mellow
-[[`Paper`]()] [[`GitHub`](https://github.com/soham97/Mellow)] [[`Checkpoint`](https://huggingface.co/soham97/Mellow)] [[`Zenodo`](https://zenodo.org/records/15002886)]
 Mellow is a small Audio-Language Model that takes in two audios and a text prompt as input and produces free-form text as output. It is a 167M parameter model and trained on ~155 hours of audio (AudioCaps and Clotho), and achieves SoTA performance on different tasks with 50x fewer parameters.
@@ -96,5 +96,13 @@ With Mellow, we aim to showcase that small audio-language models can engage in r
 ## Citation
 ```
 ```

   - zero-shot
   - audio-text
 ---
+# Mellow: a small audio language model for reasoning
+[[`Paper`](https://arxiv.org/abs/2503.08540)] [[`GitHub`](https://github.com/soham97/Mellow)] [[`Checkpoint`](https://huggingface.co/soham97/Mellow)] [[`Zenodo`](https://zenodo.org/records/15002886)]
 Mellow is a small Audio-Language Model that takes in two audios and a text prompt as input and produces free-form text as output. It is a 167M parameter model and trained on ~155 hours of audio (AudioCaps and Clotho), and achieves SoTA performance on different tasks with 50x fewer parameters.
 ## Citation
 ```
+@misc{mellow,
+      title={Mellow: a small audio language model for reasoning},
+      author={Soham Deshmukh and Satvik Dixit and Rita Singh and Bhiksha Raj},
+      year={2025},
+      eprint={2503.08540},
+      archivePrefix={arXiv},
+      primaryClass={cs.SD},
+      url={https://arxiv.org/abs/2503.08540},
+}
 ```