Welcome to LLMSQL Project

LLMSQL is a Python package for evaluation Hugging Face models on LLMSQL benchmark with transformers and vLLM.

💡 Description

LLMSQL Benchmark is an open-source framework providing a modernized, cleaned, and extended version of the original WikiSQL dataset, specifically designed for evaluating Hugging Face style Large Language Models (LLMs) on Text-to-SQL tasks.

Key improvements

Data Cleaning: Resolved duplicates, datatype mismatches, and inconsistent casing, reducing the widespread occurrence of empty query results.
LLM-Ready Format: Reformatted SQL queries stored in WikiSQL’s custom encoding into standard SQL syntax.

📚 Documentation

Note: Documentation pages (installation guide, API reference) are under construction.
See Quick Start below or the README files inside the repo.

⚡ Quick Start

⚠️ WARNING — Reproducibility

vLLM and HuggingFace Transformers may generate different text outputs even when configured with the same model generation parameters (e.g., temperature=0), because of differences in implementation, floating-point computation, and how batches are processed.

Recommendation: when comparing model quality, use the same backend (either only vLLM or only Transformers).

Sources:
• vLLM FAQ: FAQ
• Model Support Policy: Supported Models

1️⃣ Installation

Install the base package:

pip install llmsql

To enable the vLLM backend:

pip install llmsql[vllm]

2️⃣ Inference from CLI

vLLM Backend (Recommended)

llmsql inference --method vllm \
--model-name Qwen/Qwen2.5-1.5B-Instruct \
--output-file outputs/preds.jsonl \
--batch-size 8 \
--num_fewshots 5 \
--temperature 0.0

Transformers Backend

llmsql inference --method transformers \
--model-or-model-name-or-path Qwen/Qwen2.5-1.5B-Instruct \
--output-file outputs/preds.jsonl \
--batch-size 8 \
--temperature 0.9 \
--generation-kwargs '{"do_sample": false, "top_p": 0.95}'

3️⃣ Evaluation API (Python)

from llmsql import evaluate

report =evaluate(outputs="path_to_your_outputs.jsonl")
print(report)

Or with ther results from the infernece:

from llmsql import evaluate

# results = inference_transformers(...) or infernce_vllm(...)

report =evaluate(outputs=results)
print(report)

🔗 Resources

Resource	Details
📦 PyPI Project	llmsql on PyPI
💾 Dataset on Hugging Face	llmsql-bench dataset
💻 Source Code	GitHub repo

📊 Leaderboard [in progress]

The official Leaderboard is currently empty and in progress. Submit your model results to be the first on the ranking!

📄 Citation

@inproceedings{llmsql_bench,
  title={LLMSQL: Upgrading WikiSQL for the LLM Era of Text-to-SQL},
  author={Pihulski, Dzmitry and Charchut, Karol and Novogrodskaia, Viktoria and Koco{'n}, Jan},
  booktitle={2025 IEEE ICDMW},
  year={2025},
  organization={IEEE}
}

💬 Made with ❤️ by the LLMSQL Team