Training, eval suite, and model from the paper "Large Scale Transfer Learning for Tabular Data via Language Modeling" https://arxiv.org/abs/2406.12031
ML Foundations
non-profit
AI & ML interests
None defined yet.
Recent Activity
View all activity
Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens"
-
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens
Paper • 2406.11271 • Published • 21 -
mlfoundations/MINT-1T-HTML
Viewer • Updated • 623M • 40.4k • 89 -
mlfoundations/MINT-1T-ArXiv
Viewer • Updated • 5.6M • 3.07k • 48 -
mlfoundations/MINT-1T-PDF-CC-2024-18
Updated • 13.6k • 19
DCLM Models + Datasets
Training, eval suite, and model from the paper "Large Scale Transfer Learning for Tabular Data via Language Modeling" https://arxiv.org/abs/2406.12031
Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens"
-
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens
Paper • 2406.11271 • Published • 21 -
mlfoundations/MINT-1T-HTML
Viewer • Updated • 623M • 40.4k • 89 -
mlfoundations/MINT-1T-ArXiv
Viewer • Updated • 5.6M • 3.07k • 48 -
mlfoundations/MINT-1T-PDF-CC-2024-18
Updated • 13.6k • 19
Raw pools for use in DCLM competition
DCLM Models + Datasets