Yang's picture

Yang

jacklanda

·

AI & ML interests

Reasoning, Mech Interp, Semantics

Recent Activity

updated a dataset 9 days ago

jacklanda/SemanticQA

published a dataset 9 days ago

jacklanda/SemanticQA

authored a paper 11 days ago

Revisiting a Pain in the Neck: A Semantic Reasoning Benchmark for Language Models

View all activity

Organizations

updated a dataset 9 days ago

jacklanda/SemanticQA

Updated 9 days ago • 33

published a dataset 9 days ago

jacklanda/SemanticQA

Updated 9 days ago • 33

authored a paper 11 days ago

Revisiting a Pain in the Neck: A Semantic Reasoning Benchmark for Language Models

Paper • 2604.16593 • Published 16 days ago • 6

updated 2 collections 12 days ago

Semantics

My Research work on (Lexical) Semantics. • 4 items • Updated 12 days ago

Evaluations

Evals for Language Agents • 4 items • Updated 12 days ago

upvoted a paper 12 days ago

Revisiting a Pain in the Neck: A Semantic Reasoning Benchmark for Language Models

Paper • 2604.16593 • Published 16 days ago • 6

submitted a paper to Daily Papers 12 days ago

Revisiting a Pain in the Neck: A Semantic Reasoning Benchmark for Language Models

Paper • 2604.16593 • Published 16 days ago • 6

updated a collection about 1 month ago

Evaluations

Evals for Language Agents • 4 items • Updated 12 days ago

updated a dataset about 2 months ago

humanlaya-data-lab/OneMillion-Bench

Viewer • Updated Mar 11 • 400 • 241 • 11

commented a paper about 2 months ago

\$OneMillion-Bench: How Far are Language Agents from Human Experts?

Paper • 2603.07980 • Published Mar 9 • 27 •

authored a paper about 2 months ago

\$OneMillion-Bench: How Far are Language Agents from Human Experts?

Paper • 2603.07980 • Published Mar 9 • 27

upvoted a paper about 2 months ago

\$OneMillion-Bench: How Far are Language Agents from Human Experts?

Paper • 2603.07980 • Published Mar 9 • 27

submitted a paper to Daily Papers about 2 months ago

\$OneMillion-Bench: How Far are Language Agents from Human Experts?

Paper • 2603.07980 • Published Mar 9 • 27

liked a dataset about 2 months ago

humanlaya-data-lab/OneMillion-Bench

Viewer • Updated Mar 11 • 400 • 241 • 11

published a dataset about 2 months ago

humanlaya-data-lab/OneMillion-Bench

Viewer • Updated Mar 11 • 400 • 241 • 11

upvoted a paper about 2 months ago

MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier

Paper • 2603.03756 • Published Mar 4 • 89

upvoted a paper 2 months ago

Understanding and Leveraging the Expert Specialization of Context Faithfulness in Mixture-of-Experts LLMs

Paper • 2508.19594 • Published Aug 27, 2025 • 3

updated a collection 2 months ago

Semantics

My Research work on (Lexical) Semantics. • 4 items • Updated 12 days ago