Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
bluelightai-dev
's Collections
Sampled Datasets
Sampled Datasets
updated
28 days ago
Random samples from large datasets, for convenience.
Upvote
-
bluelightai-dev/dclm-full-deduped-sample
Viewer
•
Updated
29 days ago
•
4.92M
•
133
bluelightai-dev/the-stack-dedup-sample
Viewer
•
Updated
29 days ago
•
474k
•
52
bluelightai-dev/common-corpus-sample-open-culture
Viewer
•
Updated
29 days ago
•
462k
•
57
bluelightai-dev/common-corpus-sample-open-government
Viewer
•
Updated
29 days ago
•
373k
•
74
•
1
bluelightai-dev/common-corpus-sample-open-science
Viewer
•
Updated
29 days ago
•
284k
•
58
bluelightai-dev/common-corpus-sample-open-source
Viewer
•
Updated
29 days ago
•
2.02M
•
54
bluelightai-dev/common-corpus-sample-open-web
Viewer
•
Updated
29 days ago
•
4.8M
•
88
bluelightai-dev/MathPile_Commercial-formatted
Viewer
•
Updated
28 days ago
•
389k
•
103
Upvote
-
Share collection
View history
Collection guide
Browse collections