CogniSQL

community

AI & ML interests

None defined yet.

Recent Activity

ArpitSinghGautam  updated a dataset 3 days ago
CogniSQL/Reasoning_Traces
ArpitSinghGautam  updated a dataset 3 days ago
CogniSQL/Positive_Sample_Corpus
ArpitSinghGautam  updated a Space 3 days ago
CogniSQL/README
View all activity

CogniSQL: Lightweight Reinforced Reasoning for Efficient SQL Generation

Overview

Welcome to CogniSQL! This organization hosts research datasets and resources for advancing Text-to-SQL generation through reinforcement learning. Our work focuses on building efficient, execution-aligned SQL generation systems that scale effectively while maintaining accuracy on complex database queries.

Research Focus

CogniSQL develops novel approaches to translate natural language into SQL (Text-to-SQL) using:

  • Reinforcement Learning (RL) Frameworks: Lightweight reward signals based on execution correctness and format-tag compliance
  • Efficient Training: State-of-the-art performance on a smaller 7B parameter backbone (compared to 236B+ models)
  • Execution-Aligned Generation: Direct optimization for producing correct, executable SQL without intermediate supervision
  • Interpretable Reasoning: Multi-path reasoning traces for better understanding of model behavior

Key Achievements

  • State-of-the-Art Results: Outperforms SFT CodeS-7B, DeepSeek-Coder 236B, and Mistral 123B on BIRD benchmark
  • Efficient Training: Trained on just 4 NVIDIA A100 GPUs (40GB VRAM each)
  • Resource-Constrained Deployment: Enables practical Text-to-SQL systems for real-world applications
  • Open Research: Two curated datasets released for community research

Datasets

This organization maintains two high-quality datasets:

  1. Reasoning_Traces: 5,024 reasoning traces with varying context lengths for interpretable SQL generation
  2. Positive_Sample_Corpus: 36,356 weakly supervised queries, each annotated with six semantically diverse reasoning paths

Both datasets are designed to support research in efficient and interpretable Text-to-SQL modeling.

Citation

If you use our datasets or research, please cite the following paper:

@article{gajjar2025cognisql,
  title={CogniSQL-R1-Zero: Lightweight Reinforced Reasoning for Efficient SQL Generation},
  author={Gajjar, Kushal and Sikchi, Harshit and Gautam, Arpit Singh and Hammons, Marc and Jha, Saurabh},
  journal={arXiv preprint arXiv:2507.06013},
  year={2025},
  url={https://arxiv.org/abs/2507.06013}
}

arXiv: 2507.06013

Research Team

  • Kushal Gajjar
  • Harshit Sikchi
  • Arpit Singh Gautam
  • Marc Hammons
  • Saurabh Jha

Applications

Our work enables:

  • Database query systems that understand natural language
  • Efficient SQL generation in resource-constrained environments
  • Interpretable AI systems with transparent reasoning traces
  • Production-grade Text-to-SQL pipelines

License

Please refer to individual dataset cards for specific licensing information.

Related Links

models 0

None public yet