← GSP for DataHub

Parser Benchmark: sqlglot vs GSP

This benchmark is under development. When published, it will include raw data, full methodology, and peer-reviewed reproduction memos — so you can verify every claim yourself.

What we're measuring

500+ real-world SQL statements drawn from DataHub GitHub issues, production BigQuery audit logs, and Snowflake/Databricks workloads. Each statement is tagged by dialect and construct, then parsed by sqlglot, sqllineage, openlineage-sql, and GSP.

Parse success rate

Does the parser return a valid AST or fall back to an opaque Command?

Lineage completeness

Table-level and column-level relationships detected vs. ground truth.

Construct coverage

Stored procedures, MERGE, recursive CTEs, dynamic SQL, temp tables, window functions.

Parser pain in the DataHub ecosystem

Issue counts from the DataHub issue tracker, sourced from the gap analysis.

Dialect / construct Tier Total mentions Open issues
BigQuery Tier 1 90 16
Snowflake Tier 1 96 9
Databricks Tier 1 69 12
MSSQL T-SQL Tier 2 17 6
Oracle Tier 2 47
MERGE Tier 2 42 9

Data from materials/oss-gap-analysis/master-sheet.md, collected 2026-04-16.

Methodology (planned)

  1. Corpus assembly — 500+ statements from real GitHub issues (e.g. #11654), sanitized customer SQL, and synthetic edge cases
  2. Ground truth — manually verified table-level and column-level lineage for each statement
  3. Parser execution — each statement parsed by sqlglot, sqllineage, openlineage-sql, and GSP under identical conditions
  4. Scoring — parse success rate, lineage precision/recall, per-dialect and per-construct breakdowns
  5. Peer review — 2–3 external practitioners reproduce the benchmark on our fixtures + their own SQL
  6. Publication — raw data, runner scripts, and reproduction memos published in gudusoft/sql-parser-benchmark (MIT)

See the sidecar in action today

While the full benchmark is being assembled, you can test GSP against your own SQL right now.

Try the Quick Start →