GSP SQLFlow Fix it now →
Problem Proof How it works Backends FAQ Fix it now →
Open source · pip install · 60 seconds

Stop losing Power BI lineage
in DataHub.

Power BI encodes newlines as #(lf) in M-language. Without decoding, -- comments swallow every JOIN and WHERE clause after them. DataHub's lineage graph goes silent. One pip install brings it all back.

1
Upstream table found by sqlglot
(out of 2)
5
Column-level lineages recovered
(plus 2 table-level)
20+
SQL dialects supported
Works with
The problem

Your lineage graph is lying to you.

Power BI embeds SQL inside M-language Value.NativeQuery calls, encoding newlines as #(lf). When DataHub's parser hits a -- comment, it treats it as running to the end of the entire string — because #(lf) isn't a real newline.

Every JOIN, WHERE clause, and upstream table after that first comment vanishes from lineage. No warning. No error. Just silence.

Open since August 2024. Three users confirmed it blocks DataHub adoption. Issue #11251 — still no fix.

-- Real query from DataHub issue #11251
SELECT upper(cs.customercode),
  cs.ear2id, db.branch_rollup_name
FROM dim_customer cs
--join ... (commented out)
JOIN dim_customer_ear2... as so
JOIN ref_branch db ON ...
WHERE cs.customerstatusid = 1 --active
Any parser result (without #(lf) decoding): 1 upstream table — everything after the -- comment is gone
The proof

Same query. Dramatically different results.

DataHub default (sqlglot)
1
upstream table
0 column-level lineages
Without #(lf) decoding (comment strips JOINs)
With gsp-datahub-sidecar
5
column-level lineages
+ 2 table-level
Sidecar: #(lf) decoding + GSP SQLFlow

Same query with SQL comments. 1 vs 2 upstream tables, 0 vs 5 column-level lineages.

How it works

3 steps. 60 seconds.

1

Install

One pip command. No Docker, no infra changes, no DataHub plugins.

pip install git+https://github.com/gudusoftware/gsp-datahub-sidecar.git
2

Run

The sidecar decodes Power BI's #(lf) encoding, then GSP parses the clean SQL and recovers all JOINs.

gsp-datahub-sidecar --sql-file query.sql --db-vendor dbvmssql --dry-run
3

See the lineage

Open DataHub. Every upstream table and column-level relationship is back. Comments handled cleanly.

Choose your backend

Three backends. Pick your comfort level.

Every backend uses the same GSP SQLFlow engine. The only difference is where SQL gets parsed.

Anonymous

Free · No signup
  • Cloud-parsed, not logged
  • Rate-limited (fair use)
  • Great for evaluation
Get started →

Self-Hosted

On-premise · SQLFlow license
  • SQL never leaves your network
  • No rate limits
  • Full audit trail
  • Enterprise support
Talk to us →
From the community

The evidence is in the issue tracker.

“This is something very important. Especially when SQL comments are very common in PBI M-queries.”

— @AntonisCSt, DataHub #11251

“We also run into this issue and it is a real issue for part of our business to accept Datahub as a common catalog solution.”

— @rospe, DataHub #11251

Open since August 2024. 3 affected users. No maintainer fix. The sidecar recovers every missing relationship.

FAQ

Common questions.

Does this replace DataHub's lineage parser?

No. The sidecar augments DataHub's existing parser. DataHub still runs sqlglot for standard SQL. The sidecar re-parses statements where comments or encoding cause sqlglot to drop lineage edges.

Why do comments break Power BI lineage?

Power BI's M-language encodes newlines as #(lf) in SQL passed via Value.NativeQuery. Since #(lf) isn't a real newline, -- comments consume everything to end-of-string instead of end-of-line. All subsequent JOINs and WHERE clauses vanish from lineage.

Which SQL dialects are supported?

The GSP SQLFlow engine supports SQL Server, BigQuery, Snowflake, Oracle, PostgreSQL, Redshift, Teradata, and 20+ other dialects. For Power BI queries that typically target SQL Server or Snowflake, use --db-vendor dbvmssql or --db-vendor dbvsnowflake.

Is my SQL sent to a third party?

Depends on the backend you choose. Anonymous and Authenticated modes parse SQL in Gudu Software's cloud (processed in memory, not logged or stored). Self-hosted mode keeps everything on your infrastructure.

How is this different from sqlglot?

sqlglot parses SQL comments correctly when newlines are real — the issue is that Power BI encodes newlines as #(lf), and no SQL parser (sqlglot or GSP) can handle that without preprocessing. The sidecar adds the missing step: it decodes #(lf) back to real newlines before sending the SQL to GSP. On the #11251 reproducer: 1 upstream table (without sidecar) vs 2 tables + 5 column-level lineages (with sidecar).

What does it cost?

The sidecar tool is open source (Apache 2.0). The Anonymous backend is free with fair-use rate limits. Authenticated and Self-hosted backends have separate pricing — contact us for details.

Recover your Power BI lineage.

One command. Every missing upstream table and column-level relationship back in DataHub.

$ pip install git+https://github.com/gudusoftware/gsp-datahub-sidecar.git Copy

Open source on GitHub · Apache 2.0 license