Perception Encoder: The best visual embeddings are not at the output of the network Paper • 2504.13181 • Published Apr 17 • 35
bosonai/higgs-audio-v2-generation-3B-base Text-to-Speech • 6B • Updated 2 days ago • 111k • 453
nvidia/parakeet-tdt-0.6b-v2 Automatic Speech Recognition • 0.6B • Updated Jun 26 • 701k • 1.25k
Snowflake/snowflake-arctic-embed-l-v2.0 Sentence Similarity • 0.6B • Updated 2 days ago • 264k • • 192