Translating Natural Language Questions to SQL Queries

Text-to-SQL models have the potential to democratize data analytics by making queries as simple as asking natural language English questions. Sequence-to-sequence models have performed well at the Text-to-SQL task on datasets such as WikiSQL. However, most prior work does not examine generalizability of the models to unfamiliar table schemas. We build on the ideas introduced by Chang et al. to improve a sequence-to-sequence dual-task learning model by generalizing better on a zero-shot testbed which consists of schemas the model has never encountered before. We use the pre-trained BERT-based TAPAS transformer model to encode more expressive table representations for the schema, in addition to the existing BiLSTM-based encodings. Additionally, we use techniques from semantic parsing research such as the coverage mechanism and more flexible attention algorithms to propose a model that achieves a 5+% accuracy improvement over the base dual-task sequence-to-sequence model on the zero-shot test set.