Ville Komulainen's picture

1 7 1

Ville Komulainen

Villekom

·

Vmjkom

AI & ML interests

NLP, text generation, semantic analysis

Recent Activity

updated a collection 8 days ago

open-sci-ref-0.01 HPLT-2.0

updated a collection 8 days ago

open-sci-ref-0.01 CommonCorpus

updated a collection 8 days ago

open-sci-ref-0.01 HPLT-2.0

View all activity

Organizations

upvoted 2 papers about 1 month ago

Got Compute, but No Data: Lessons From Post-training a Finnish LLM

Paper • 2503.09407 • Published Mar 12 • 1

An Expanded Massive Multilingual Dataset for High-Performance Language Technologies

Paper • 2503.10267 • Published Mar 13 • 1

upvoted 3 papers 6 months ago

Towards Best Practices for Open Datasets for LLM Training

Paper • 2501.08365 • Published Jan 14 • 64

Preference Leakage: A Contamination Problem in LLM-as-a-judge

Paper • 2502.01534 • Published Feb 3 • 41

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 241

upvoted 2 papers over 1 year ago

Poro 34B and the Blessing of Multilinguality

Paper • 2404.01856 • Published Apr 2, 2024 • 16

Instruction-Following Evaluation for Large Language Models

Paper • 2311.07911 • Published Nov 14, 2023 • 21