GSP SQLFlow Fix it now →
Problem Proof How it works Backends FAQ Fix it now →
Open source · pip install · 60 seconds

Stop losing BigQuery lineage
in DataHub.

Procedural SQL — DECLARE, IF/END IF, CALL, CREATE TEMP TABLE — silently disappears from DataHub's lineage graph. One pip install brings it all back.

0
Relationships found by sqlglot
11
Column-level lineages recovered
(plus 2 table-level)
20+
SQL dialects supported
Works with
The problem

Your lineage graph is lying to you.

DataHub uses sqlglot to parse SQL. When it hits procedural constructs — stored procedures, control flow, dynamic SQL — it silently falls back to an opaque Command node. Zero lineage extracted. Zero warnings.

Your BigQuery warehouse's most business-critical queries are invisible in the lineage graph. You see empty panels and assume "it's fine."

It's not.

-- Real query from DataHub issue #11654
DECLARE cutoff_date DATE;
IF condition THEN
  CREATE TEMP TABLE stg AS
  SELECT * FROM source_table;
  INSERT INTO final_output
  SELECT * FROM stg;
END IF;
sqlglot result: Command('DECLARE cutoff_date...') — 0 lineage edges extracted
The proof

Same query. Dramatically different results.

DataHub default (sqlglot)
0
column-level relationships
sqlglot fallback
With gsp-datahub-sidecar
11
column-level relationships
+ 2 table-level
GSP SQLFlow engine

Same 4 tables. Same query. 0 vs 11 column-level relationships, plus 2 table-level.

How it works

3 steps. 60 seconds.

1

Install

One pip command. No Docker, no infra changes, no DataHub plugins.

pip install git+https://github.com/gudusoftware/gsp-datahub-sidecar.git
2

Run

Point the sidecar at your DataHub GMS. It re-parses every failed SQL statement automatically.

gsp-sidecar emit --gms-url http://localhost:8080
3

See the lineage

Open DataHub. Every column-level relationship is back. Stored procedures, temp tables, control flow — all visible.

Choose your backend

Three backends. Pick your comfort level.

Every backend uses the same GSP SQLFlow engine. The only difference is where SQL gets parsed.

Anonymous

Free · No signup
  • Cloud-parsed, not logged
  • Rate-limited (fair use)
  • Great for evaluation
Get started →

Self-Hosted

On-premise · SQLFlow license
  • SQL never leaves your network
  • No rate limits
  • Full audit trail
  • Enterprise support
Talk to us →
From the community

The evidence is in the issue tracker.

On the procedural BigQuery script pasted in DataHub issue #11654: 0 relationships recovered by the default parser vs 11 column-level + 2 table-level recovered by the sidecar — on the exact same query.

Reproducer: DataHub issue #11654 · verified April 2026

Works with your existing DataHub 0.13+ installation. No DataHub fork or plugin required.

FAQ

Common questions.

Does this replace DataHub's lineage parser?

No. The sidecar augments DataHub's existing parser. DataHub still runs sqlglot for standard SQL. The sidecar only re-parses statements that sqlglot couldn't handle — procedural constructs like DECLARE, IF/END IF, CALL, and CREATE TEMP TABLE.

Which SQL dialects are supported?

BigQuery is the primary focus. The GSP SQLFlow engine also supports SQL Server, Oracle, PostgreSQL, Snowflake, Redshift, Teradata, and 20+ other dialects. If your warehouse uses procedural SQL, the sidecar can likely parse it.

Is my SQL sent to a third party?

Depends on the backend you choose. Anonymous and Authenticated modes parse SQL in Gudu Software's cloud (processed in memory, not logged or stored). Self-hosted mode keeps everything on your infrastructure — SQL never leaves your network.

How long does installation take?

Under 60 seconds. Run pip install git+https://github.com/gudusoftware/gsp-datahub-sidecar.git, point it at your DataHub GMS endpoint, and run. No Docker, no Kubernetes, no DataHub plugins to install.

How is this different from sqlglot?

sqlglot handles standard SQL well but drops procedural constructs silently — it falls back to an opaque Command node with zero lineage. The GSP engine parses the full procedural SQL, including control flow, temp tables, and dynamic SQL. On the DataHub #11654 reproducer: 0 relationships (sqlglot) vs 11 column-level + 2 table-level (GSP).

What does it cost?

The sidecar tool is open source (Apache 2.0). The Anonymous backend is free with fair-use rate limits. Authenticated and Self-hosted backends have separate pricing — contact us for details.

Recover your BigQuery lineage.

One command. Every missing column-level relationship back in DataHub.

$ pip install git+https://github.com/gudusoftware/gsp-datahub-sidecar.git Copy

Open source on GitHub · Apache 2.0 license