Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

walterShen's picture

17 2

walterShen

walterShen

drgitt's profile picture

·

_walterShen

AI & ML interests

None yet

Organizations

None yet

Collections 8

Code LMs Evaluation

Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code

Paper • 2311.07989 • Published Nov 14, 2023 • 26
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Paper • 2310.06770 • Published Oct 10, 2023 • 9
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

Paper • 2401.03065 • Published Jan 5, 2024 • 11
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming

Paper • 2402.14261 • Published Feb 22, 2024 • 11

Code LMs Benchmark

Running

1.47k

Big Code Models Leaderboard

📈

1.47k

Submit code models for evaluation and view leaderboard
Running

450

Can Ai Code Results

🏆

450

Can AI Code? An LLM leaderboard inclquantized models.
openai/openai_humaneval

Viewer • Updated Jan 4, 2024 • 164 • 106k • 354
google-research-datasets/mbpp

Viewer • Updated Jan 4, 2024 • 1.4k • 2.55M • 188

Code LMs Evaluation

Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code

Paper • 2311.07989 • Published Nov 14, 2023 • 26
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Paper • 2310.06770 • Published Oct 10, 2023 • 9
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

Paper • 2401.03065 • Published Jan 5, 2024 • 11
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming

Paper • 2402.14261 • Published Feb 22, 2024 • 11

Code LMs Benchmark

Running

1.47k

Big Code Models Leaderboard

📈

1.47k

Submit code models for evaluation and view leaderboard
Running

450

Can Ai Code Results

🏆

450

Can AI Code? An LLM leaderboard inclquantized models.
openai/openai_humaneval

Viewer • Updated Jan 4, 2024 • 164 • 106k • 354
google-research-datasets/mbpp

Viewer • Updated Jan 4, 2024 • 1.4k • 2.55M • 188

View 8 collections

models 0

None public yet

datasets 0

None public yet

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs