SciLaD: A Large-Scale, Transparent, Reproducible Dataset for Natural Scientific Language Processing
Paper
•
2512.11192
•
Published
Extraction of structured data from the Common Crawl schema.org annotations, web tables, hyperlink graphs