Constructing Verifiable Environments for Data-Driven Discovery
AI & ML interests
Natural language processing, language models, language agents
Recent Activity
Papers
Automatic Image-Level Morphological Trait Annotation for Organismal Images
When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents
spaces 3
pinned
Running
Agents
23
Online-Mind2Web Leaderboard
🌐
Explore Mind2Web agent performance with interactive tables and charts
Running
Agents
21
TravelPlannerLeaderboard
💻
Display and submit travel planner evaluation results
Running
Agents
4
TravelPlannerEnvironment
👀
Plan a travel itinerary with cost tracking
models 63
osunlp/D3-Gym-4B-rft-self
4B • Updated • 30
osunlp/D3-Gym-32B-rft-self
33B • Updated • 9
osunlp/D3-Gym-14B-rft-distill
15B • Updated • 12
osunlp/D3-Gym-14B-rft-self
15B • Updated • 12
osunlp/D3-Gym-8B-rft-distill
8B • Updated • 14
osunlp/D3-Gym-8B-rft-self
8B • Updated • 14
osunlp/D3-Gym-4B-rft-distill
4B • Updated • 15
osunlp/SAE_DINOv3_TopK_ViT-L-16_IN1K
Updated
osunlp/SAE_DINOv3_ViT-L-16_IN1K
Updated
osunlp/SAE_DINOv3_ViT-B-16_IN1K
Updated
datasets 29
osunlp/D3-Gym
Viewer • Updated • 571 • 60
osunlp/D3-Gym-Trajectories
Viewer • Updated • 6.37k • 19
osunlp/ScienceAgentBench
Viewer • Updated • 102 • 1.6k • 19
osunlp/bioscan-traits
Viewer • Updated • 80.8k • 113 • 1
osunlp/GUI-Drag-dataset
Preview • Updated • 62 • 3
osunlp/MisActBench
Updated • 89 • 2
osunlp/AutoElicit-Exec
Viewer • Updated • 132 • 85 • 1
osunlp/AutoElicit-Seed
Viewer • Updated • 361 • 39 • 1
osunlp/AutoElicit-Bench
Viewer • Updated • 117 • 68 • 1
osunlp/TACO-Cobalt-PTB
Viewer • Updated • 184 • 35