Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
fancyzhx 's Collections
Audio Datasets
Robotic Datasets
Video Datasets
Image Datasets
Text Datasets

Text Datasets

updated Jun 20
Upvote
-

  • Running
    130

    TxT360: Trillion Extracted Text

    📖
    130

    Explore and utilize a large, deduplicated text dataset for LLM training


  • CASIA-LM/ChineseWebText2.0

    Viewer • Updated Dec 2, 2024 • 2k • 1.68k • 27

  • HPLT/HPLT2.0_cleaned

    Viewer • Updated 18 days ago • 9.03B • 111k • 36

  • TrevorDohm/Pile_Tokenized

    Viewer • Updated Feb 20, 2024 • 134M • 801
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs