Welcome to LLMSQL Project
LLMSQL is a Python package for evaluation Hugging Face models on LLMSQL benchmark with transformers and vLLM.
Description
LLMSQL Benchmark is an open-source framework providing a modernized, cleaned, and extended version of the original WikiSQL dataset, specifically designed for evaluating Hugging Face style Large Language Models (LLMs) on Text-to-SQL tasks.
📣 Latest News
Loading latest news...
Documentation
Note: Documentation pages (installation guide, API reference) are under construction.
See Quick Start below or the README files inside the repo.
Quick Start
⚠️ WARNING — Reproducibility
vLLM and HuggingFace Transformers may generate different text outputs even when configured with the same model generation parameters (e.g., temperature=0), because of differences in implementation, floating-point computation, and how batches are processed.
Recommendation: when comparing model quality, use the same backend (either only vLLM or only Transformers).
Sources:
• vLLM FAQ:
FAQ
• Model Support Policy:
Supported Models
1️⃣ Installation
Install the base package:
pip install llmsql
To enable the vLLM backend:
pip install llmsql[vllm]
2️⃣ Inference from CLI
vLLM Backend (Recommended)
llmsql inference vllm \
--model-name Qwen/Qwen2.5-1.5B-Instruct \
--output-file outputs/preds.jsonl \
--batch-size 8 \
--num_fewshots 5 \
--temperature 0.0
Transformers Backend
llmsql inference transformers \
--model-or-model-name-or-path Qwen/Qwen2.5-1.5B-Instruct \
--output-file outputs/preds.jsonl \
--batch-size 8 \
--temperature 0.9 \
--generation-kwargs '{"do_sample": false, "top_p": 0.95}'
3️⃣ Evaluation API (Python)
from llmsql import evaluate
report =evaluate(outputs="path_to_your_outputs.jsonl")
print(report)
Or with ther results from the infernece:
from llmsql import evaluate
# results = inference_transformers(...) or infernce_vllm(...)
report =evaluate(outputs=results)
print(report)
🔗 Resources
| Resource | Details |
|---|---|
| 📦 PyPI Project | llmsql on PyPI |
| 💾 Dataset on Hugging Face | llmsql-bench dataset |
| 💻 Source Code | GitHub repo |
| 💻 Playground | HF Space |
📊 Leaderboard — Execution Accuracy (EX)
Loading leaderboard...
📄 Citation
@inproceedings{llmsql_bench,
title={LLMSQL: Upgrading WikiSQL for the LLM Era of Text-to-SQL},
author={Pihulski, Dzmitry and Charchut, Karol and Novogrodskaia, Viktoria and Koco{'n}, Jan},
booktitle={2025 IEEE International Conference on Data Mining Workshops (ICDMW)},
year={2025},
organization={IEEE}
}