Seq vs Seq: An Open Suite of Paired Encoders and Decoders Paper • 2507.11412 • Published 13 days ago • 23
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 By eliebak and 2 others • Jan 28 • 877
Should We Still Pretrain Encoders with Masked Language Modeling? Paper • 2507.00994 • Published 26 days ago • 74
view article Article Training and Finetuning Sparse Embedding Models with Sentence Transformers v5 By tomaarsen and 1 other • 27 days ago • 105
Zeroshot Classifiers Collection These are my current best zeroshot classifiers. Some of my older models are downloaded more often, but the models in this collection are newer/better. • 12 items • Updated Jan 6 • 139
view article Article Multi-Label Classification Model From Scratch: Step-by-Step Tutorial By Valerii-Knowledgator • Jan 8, 2024 • 45
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain Paper • 2407.19584 • Published Jul 28, 2024 • 66
Tajik Datasets Collection Datasets that have tajik subset or entirely tajik • 13 items • Updated Feb 20 • 4
Open Australian Legal Models Collection A collection of open source Australian legal language models • 6 items • Updated Jun 15, 2024 • 1
Open Australian Legal Data Collection A collection of open source Australian legal datasets • 3 items • Updated Jun 15, 2024 • 5