Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Web Data Commons

non-profit
http://webdatacommons.org/
wbsg-uni-mannheim
Activity Feed Request to join this org

AI & ML interests

Extraction of structured data from the Common Crawl schema.org annotations, web tables, hyperlink graphs

Recent Activity

pjox  authored a paper 11 days ago
SciLaD: A Large-Scale, Transparent, Reproducible Dataset for Natural Scientific Language Processing
pjox  authored a paper over 1 year ago
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus
pjox  authored a paper about 2 years ago
CamemBERT: a Tasty French Language Model
View all activity

Christian Bizer's profile picture Alexander Brinkmann's profile picture Pedro Ortiz Suarez's profile picture Ralph Peeters's profile picture

pjox 
authored a paper 11 days ago

SciLaD: A Large-Scale, Transparent, Reproducible Dataset for Natural Scientific Language Processing

Paper • 2512.11192 • Published Dec 12, 2025
pjox 
authored a paper over 1 year ago

mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus

Paper • 2406.08707 • Published Jun 13, 2024 • 17
pjox 
authored a paper about 2 years ago

CamemBERT: a Tasty French Language Model

Paper • 1911.03894 • Published Nov 10, 2019 • 4
pjox 
authored a paper almost 3 years ago

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Paper • 2211.05100 • Published Nov 9, 2022 • 36
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs