arxiv:2505.14045
Yingli Shen
ylshen
AI & ML interests
Postdoctoral Researcher @ THUNLP, Tsinghua University.
Researching Multilingual Large Language Models.
Recent Activity
authored
a paper
20 days ago
DCAD-2000: A Multilingual Dataset across 2000+ Languages with Data
Cleaning as Anomaly Detection
authored
a paper
20 days ago
From Unaligned to Aligned: Scaling Multilingual LLMs with Multi-Way
Parallel Corpora
updated
a dataset
20 days ago
openbmb/DCAD-2000