Stable audio open model from Synthio paper.
Generate text based on audio input and questions
Describe audio with questions