Welcome to LLMSQL Project
LLMSQL is a Python package for SQL reasoning with LLMs and vLLM inference.
💡 Description
LLMSQL Benchmark is an open-source framework providing a modernized, cleaned, and extended version of the original WikiSQL dataset, specifically designed for evaluating and fine-tuning Large Language Models (LLMs) on Text-to-SQL tasks.
Key improvements
- Data Cleaning: Fixed errors (type mismatches, case sensitivity) causing 41% empty results.
- LLM-Ready Format: Replaced numeric placeholders with standard SQL, improving training consistency.
📚 Documentation
Note: Documentation pages (installation guide, API reference) are under construction. See Quick Start below.
⚡ Quick Start
⚠️ WARNING — Reproducibility
vLLM and HuggingFace Transformers may produce different results even with the same settings (e.g., temperature=0). This is due to differences in implementation, computation precision, and batching mechanisms.
Recommendation: when comparing model quality, use the same backend (either only vLLM or only Transformers).
Sources:
• vLLM FAQ:
FAQ
• Model Support Policy:
Supported Models
Installation
pip3 install llmsql
Recommended Workflow (vLLM)
pip install llmsql[vllm]
llmsql evaluate --model gpt-4 --dataset llmsql_dev
Evaluation API (Python)
from llmsql import LLMSQLEvaluator
evaluator = LLMSQLEvaluator(workdir_path="llmsql_workdir")
report = evaluator.evaluate(outputs_path="path_to_your_outputs.jsonl")
print(report)
🔗 Resources
| Resource | Details |
|---|---|
| 📦 PyPI Project | llmsql on PyPI |
| 💾 Dataset on Hugging Face | llmsql-bench dataset |
| 💻 Source Code | GitHub repo |
📊 Leaderboard [in progress]
The official Leaderboard is currently empty and in progress. Submit your model results to be the first on the ranking!
📄 Citation
@inproceedings{llmsql_bench,
title={LLMSQL: Upgrading WikiSQL for the LLM Era of Text-to-SQL},
author={Pihulski, Dzmitry and Charchut, Karol and Novogrodskaia, Viktoria and Koco{'n}, Jan},
booktitle={2025 IEEE ICDMW},
year={2025},
organization={IEEE}
}