Parser Benchmark: sqlglot vs GSP

This benchmark is under development. When published, it will include raw data, full methodology, and peer-reviewed reproduction memos — so you can verify every claim yourself.

What we're measuring

500+ real-world SQL statements drawn from DataHub GitHub issues, production BigQuery audit logs, and Snowflake/Databricks workloads. Each statement is tagged by dialect and construct, then parsed by sqlglot, sqllineage, openlineage-sql, and GSP.

Parse success rate

Does the parser return a valid AST or fall back to an opaque Command?

Lineage completeness

Table-level and column-level relationships detected vs. ground truth.

Construct coverage

Stored procedures, MERGE, recursive CTEs, dynamic SQL, temp tables, window functions.

Parser pain in the DataHub ecosystem

Issue counts from the DataHub issue tracker, sourced from the gap analysis.

Dialect / construct	Tier	Total mentions	Open issues
BigQuery	Tier 1	90	16
Snowflake	Tier 1	96	9
Databricks	Tier 1	69	12
MSSQL T-SQL	Tier 2	17	6
Oracle	Tier 2	47	—
MERGE	Tier 2	42	9

Data from materials/oss-gap-analysis/master-sheet.md, collected 2026-04-16.

Methodology (planned)

Corpus assembly — 500+ statements from real GitHub issues (e.g. #11654), sanitized customer SQL, and synthetic edge cases
Ground truth — manually verified table-level and column-level lineage for each statement
Parser execution — each statement parsed by sqlglot, sqllineage, openlineage-sql, and GSP under identical conditions
Scoring — parse success rate, lineage precision/recall, per-dialect and per-construct breakdowns
Peer review — 2–3 external practitioners reproduce the benchmark on our fixtures + their own SQL
Publication — raw data, runner scripts, and reproduction memos published in gudusoft/sql-parser-benchmark (MIT)

See the sidecar in action today

While the full benchmark is being assembled, you can test GSP against your own SQL right now.

Try the Quick Start →