arxiv:2102.04664

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

Published on Feb 9, 2021

Upvote

Authors:

Junjie Huang ,

Ge Li ,

Abstract

CodeXGLUE is a benchmark dataset with 10 tasks and 14 datasets for evaluating machine learning models in program understanding and generation, featuring BERT-style, GPT-style, and Encoder-Decoder baselines.

AI-generated summary

Benchmark datasets have a significant impact on accelerating research in programming language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation. CodeXGLUE includes a collection of 10 tasks across 14 datasets and a platform for model evaluation and comparison. CodeXGLUE also features three baseline systems, including the BERT-style, GPT-style, and Encoder-Decoder models, to make it easy for researchers to use the platform. The availability of such data and baselines can help the development and validation of new methods that can be applied to various program understanding and generation problems.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

Abstract

Community

Models citing this paper 2

Datasets citing this paper 4

Spaces citing this paper 2

Collections including this paper 14