Papers
arxiv:2012.10309

Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

Published on Dec 18, 2020
Authors:
,
,
,
,
,
,
,

Abstract

A Generation-Augmented Pre-training (GAP) framework improves text-to-SQL semantic parsers by generating pre-train data, achieving state-of-the-art results on SPIDER and CRITERIA-TO-SQL benchmarks.

AI-generated summary

Most recently, there has been significant interest in learning contextual representations for various NLP tasks, by leveraging large scale text corpora to train large neural language models with self-supervised learning objectives, such as Masked Language Model (MLM). However, based on a pilot study, we observe three issues of existing general-purpose language models when they are applied to text-to-SQL semantic parsers: fail to detect column mentions in the utterances, fail to infer column mentions from cell values, and fail to compose complex SQL queries. To mitigate these issues, we present a model pre-training framework, Generation-Augmented Pre-training (GAP), that jointly learns representations of natural language utterances and table schemas by leveraging generation models to generate pre-train data. GAP MODEL is trained on 2M utterance-schema pairs and 30K utterance-schema-SQL triples, whose utterances are produced by generative models. Based on experimental results, neural semantic parsers that leverage GAP MODEL as a representation encoder obtain new state-of-the-art results on both SPIDER and CRITERIA-TO-SQL benchmarks.

Community

Sign up or log in to comment

Models citing this paper 2

Datasets citing this paper 19

Browse 19 datasets citing this paper

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2012.10309 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.