Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
HuggingFaceFW 's Collections
πŸ₯‚ FineWeb2
🍷 FineWeb
πŸ“š FineWeb-Edu
πŸ“€ Dataset comparison models
πŸ§ͺ FineWeb v1 data experiments

🍷 FineWeb

updated Jun 20
Upvote
25

  • Running
    1.01k
    1.01k

    FineWeb: decanting the web for the finest text data at scale

    🍷

    Generate high-quality web text data for LLM training


  • HuggingFaceFW/fineweb

    Viewer β€’ Updated 16 days ago β€’ 52.5B β€’ 764k β€’ 2.27k

  • HuggingFaceFW/fineweb-edu

    Viewer β€’ Updated 16 days ago β€’ 3.5B β€’ 150k β€’ 723

  • HuggingFaceFW/fineweb-edu-score-2

    Viewer β€’ Updated 16 days ago β€’ 13.9B β€’ 3.62k β€’ 78

  • The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

    Paper β€’ 2406.17557 β€’ Published Jun 25, 2024 β€’ 98

  • πŸ“€ Dataset comparison models

    Collection
    1.8B models trained on 350BT to compare different pretraining datasets β€’ 8 items β€’ Updated Jun 12, 2024 β€’ 40

  • πŸ§ͺ FineWeb v1 data experiments

    Collection
    Ablation models trained for our data experiments. β€’ 22 items β€’ Updated Jun 12, 2024 β€’ 6
Upvote
25
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs