Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Hynek Kydlicek's picture
22 21 29

Hynek Kydlicek

hynky
davanstrien's profile picture BrigitteTousi's profile picture dvilasuero's profile picture
Β·
  • HKydlicek
  • hynky1999

AI & ML interests

Data-processing

Organizations

Hugging Face's profile picture Evaluation datasets's profile picture HuggingFaceBR4's profile picture Hugging Face H4's profile picture Hugging Face Smol Models Research's profile picture Open LLM Leaderboard's profile picture Czech LLM Consortium's profile picture Project-Numina's profile picture Nanotron Research's profile picture FineData's profile picture mlo-data-cleaning's profile picture cvmistralparis's profile picture hackathon team's profile picture HuggingFaceEval's profile picture HuggingFaceFW-Dev's profile picture StarCoder2 Data's profile picture Hugging Face Discord Community's profile picture testing-org's profile picture Lighteval testing org's profile picture Lighteval testings datasets org's profile picture Sailor2 Evaluation's profile picture ml-fw-prerelease's profile picture Math extraction comparisson's profile picture math-extraction-multilingual's profile picture Open R1's profile picture math-reruns's profile picture gsm8k-rerun's profile picture sft-datasets's profile picture todo's profile picture

authored a paper about 1 month ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper β€’ 2506.20920 β€’ Published Jun 26 β€’ 64
authored a paper 6 months ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper β€’ 2502.02737 β€’ Published Feb 4 β€’ 241
authored a paper 7 months ago

Towards Best Practices for Open Datasets for LLM Training

Paper β€’ 2501.08365 β€’ Published Jan 14 β€’ 64
authored a paper about 1 year ago

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper β€’ 2406.17557 β€’ Published Jun 25, 2024 β€’ 98
authored a paper over 1 year ago

A Dataset and Strong Baselines for Classification of Czech News Texts

Paper β€’ 2307.10666 β€’ Published Jul 20, 2023
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs