delu's picture

6

delu

lxl3129

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation

upvoted a paper 3 days ago

FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs

upvoted a paper about 2 months ago

From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models

View all activity

Organizations

models 2

lxl3129/qwen2.5-0.5b-multi-regulation

0.5B • Updated Apr 24

lxl3129/finma

Text Generation • 0.5B • Updated Nov 22, 2024 • 1

datasets 3

lxl3129/evaluation-result

Updated May 17 • 4

lxl3129/sft-audit_regulation

Viewer • Updated Apr 24 • 34.4k • 5

lxl3129/FINEVAL_test

Viewer • Updated Jan 14 • 2.47k • 7