4 Power BI queries.
0 upstream lineage.
Here's exactly why.
DataHub issue #15327 reports 4 M-language queries connecting to Snowflake — all producing zero upstream lineage. We analyzed every query, identified two distinct failure patterns, and ran real lineage extraction against each one.
by DataHub today
recoverable
recovered (Q4 alone)
Power BI queries live in M-language. DataHub only sees SQL.
When Power BI connects to Snowflake, it generates M-language (Power Query Formula Language) — not SQL. DataHub's ingestion pipeline expects SQL and chokes on M's functional syntax.
The result: zero upstream lineage for every Power BI dataset. No tables. No columns. No relationships. The lineage graph simply stops at the Power BI boundary.
Issue #15327 documents 4 real-world queries. All 4 fail.
let
Source = Snowflake.Databases(...),
DB = Source{[Name="PROD201_DB"]}[Data],
Schema = DB{[Name="CONSUMPTION"]}[Data],
Table = Schema{[Name="details"]}[Data]
in
Table
Two patterns. Two different problems.
The 4 queries in issue #15327 fall into two fundamentally different categories. Understanding the distinction is key to fixing them.
Pure M-Language Navigation
Queries 1 & 3
The table reference is encoded in M's record-access syntax:
{[Name="X", Kind="Database"]}[Data].
There is no SQL anywhere in the query.
Source{[Name="PROD201_DB",Kind="Database"]}[Data] Value.NativeQuery() with Embedded SQL
Queries 2 & 4
M wraps actual SQL inside a Value.NativeQuery() call.
The SQL contains #(lf) escapes and ""
double-quote encoding that must be decoded first.
Value.NativeQuery(source, "SELECT ...#(lf)FROM ...") Queries 2 & 4: Embedded SQL inside M-language
These queries wrap real Snowflake SQL inside Value.NativeQuery().
The sidecar extracts the SQL, decodes #(lf) escapes, and feeds it to
GSP SQLFlow for full column-level lineage.
Simple SELECT with #(lf) escape
Pattern B · Snowflake · SELECT * Original M-Language
let Source = Value.NativeQuery( Snowflake.Databases(SnowFlakeServer, SnowFlakeDWH) {[Name="PROD201_redacted"]}[Data], "SELECT *#(lf)FROM PROD201_DB_redacted.CONSUMPTION.CV_OFFER_redacted_CITY", null, [EnableFolding=true]) in Source
Step 1: Decode M-language escapes
The SQL string "SELECT *#(lf)FROM ..." contains #(lf),
M-language's escape for a line-feed character. Decoding it produces valid SQL:
SELECT * FROM PROD201_DB_redacted.CONSUMPTION.CV_OFFER_redacted_CITY
Step 2: Analyze the SQL
SELECT * means every column from the source table flows through to the
Power BI dataset. GSP represents this as a wildcard mapping: * → *.
This is semantically correct — the exact column set depends on the source table's
schema, which may change over time.
Sidecar Result (authenticated mode, real API call)
SELECT * produces a wildcard column mapping (* → *),
meaning all columns from the source flow to the target.
$ gsp-datahub-sidecar --mode authenticated --db-vendor dbvpowerquery --dry-run
[INFO] Extracted 1 table-level lineage relationships
[INFO] PROD201_DB_REDACTED.CONSUMPTION.CV_OFFER_REDACTED_CITY
--> POWERBI_DATASET.Q2_OFFER_CITY (1 columns)
[INFO] Table-level lineages found: 1
[INFO] Column-level mappings extracted: 1
[INFO] Built 1 MCPs for 1 downstream tables (1 column-level mappings)
* -> * Complex Snowflake SQL with GROUP BY ALL, quoted aliases
Pattern B · Snowflake · GROUP BY ALL · HAVING · expressionsOriginal M-Language (abbreviated)
let Source = Value.NativeQuery( Snowflake.Databases("redacted.snowflakecomputing.com", WAREHOUSE, [Implementation="2.0"]) {[Name="PROD201_DB_redacted"]}[Data], " SELECT SERVICE_START_DATE AS ""Service Date"", TO_NUMBER(OPERATIVE_OFFICE) || '_' || statement || ... VENDOR_CODE FROM PROD201_DB_redacted_DATAMARTS.SALES_MARKETING .CV_COMMDX_SALES_TRANSFERS WHERE SERVICE_START_DATE >= '2023-10-01' ... GROUP BY ALL HAVING SUM(COST_OF_SALES_CUR) = 0 AND COUNT(*) > 1 ", null, [EnableFolding=true]), #"TimeFilter" = Table.SelectRows(...) in TimeFilter
"" is M's escape for double-quote inside strings → becomes "Service Date" in SQL.
GROUP BY ALL is Snowflake-specific syntax.
Step 1: Decode M-language escapes
M uses "" (doubled double-quote) to encode a literal "
inside strings. The alias ""Service Date"" in M becomes
"Service Date" — a valid Snowflake quoted identifier.
SELECT SERVICE_START_DATE AS "Service Date", TO_NUMBER(OPERATIVE_OFFICE) || '_' || statement || '_' || VENDOR_CODE || '_' || TO_CHAR(SERVICE_START_DATE, 'YYYYMMDD') AS OFFICE_STATEMENT_VENDOR_DATE, VENDOR_CODE FROM PROD201_DB_redacted_DATAMARTS.SALES_MARKETING .CV_COMMDX_SALES_TRANSFERS WHERE SERVICE_START_DATE >= '2023-10-01' AND PRODUCT_TYPE IN ('TRANSFER', 'EXP & REC') AND OPERATIVE_OFFICE >= 00005000 AND VENDOR_CODE IS NOT NULL AND STATEMENT IS NOT NULL GROUP BY ALL HAVING SUM(COST_OF_SALES_CUR) = 0 AND COUNT(*) > 1
Step 2: Analyze the SQL
This query has 3 SELECT columns. Column 1 is a simple alias.
Column 2 is a complex concatenation expression that references
4 source columns: OPERATIVE_OFFICE (via TO_NUMBER),
statement, VENDOR_CODE, and SERVICE_START_DATE
(via TO_CHAR). Column 3 is a direct pass-through. GSP traces through
TO_NUMBER() and TO_CHAR() functions to the underlying columns,
producing 6 column-level lineage mappings. Snowflake's GROUP BY ALL and
HAVING do not affect column lineage — they only filter rows.
Sidecar Result (authenticated mode, real API call)
Column-level lineage (6 mappings):
| Source Column | Target Column | Why | |
|---|---|---|---|
| SERVICE_START_DATE | → | SERVICE DATE | Direct alias: AS "Service Date" |
| OPERATIVE_OFFICE | → | OFFICE_STATEMENT_VENDOR_DATE | Via TO_NUMBER(OPERATIVE_OFFICE) in concat |
| STATEMENT | → | OFFICE_STATEMENT_VENDOR_DATE | Used directly in || statement || concat |
| VENDOR_CODE | → | OFFICE_STATEMENT_VENDOR_DATE | Used directly in || VENDOR_CODE || concat |
| SERVICE_START_DATE | → | OFFICE_STATEMENT_VENDOR_DATE | Via TO_CHAR(SERVICE_START_DATE, 'YYYYMMDD') |
| VENDOR_CODE | → | VENDOR_CODE | Direct pass-through (3rd SELECT column) |
$ gsp-datahub-sidecar --mode authenticated --db-vendor dbvpowerquery --dry-run
[INFO] Extracted 1 table-level lineage relationships
[INFO] PROD201_DB_REDACTED_DATAMARTS.SALES_MARKETING
.CV_COMMDX_SALES_TRANSFERS
--> POWERBI_DATASET.Q4_SALES_TRANSFERS (6 columns)
[INFO] Table-level lineages found: 1
[INFO] Column-level mappings extracted: 6
[INFO] Built 3 MCPs for 1 downstream tables (6 column-level mappings)
service_start_date -> service date # col 1: direct alias
operative_office -> office_statement_vendor_date # col 2: TO_NUMBER() in concat
statement -> office_statement_vendor_date # col 2: used in concat
vendor_code -> office_statement_vendor_date # col 2: used in concat
service_start_date -> office_statement_vendor_date # col 2: TO_CHAR() in concat
vendor_code -> vendor_code # col 3: pass-through Queries 1 & 3: Pure M-language navigation
These queries use M-language's record-access syntax to navigate to Snowflake objects.
There is no SQL in the query — the table reference is encoded
entirely in M's functional expressions. GSP's M-language parser walks the
navigation chain to extract the full database.schema.table path.
Navigation to Snowflake view + column transformations
Pattern A · Snowflake · Table.TransformColumnTypes · Table.AddColumnOriginal M-Language
let Source = Snowflake.Databases(SnowFlakeConnector, SnowflakeWarehouse), my_Database = Source{[Name="PROD201_DB_redacted",Kind="Database"]}[Data], CONSUMPTION_Schema = my_Database{[Name="CONSUMPTION",Kind="Schema"]}[Data], Details_View = CONSUMPTION_Schema{[Name="details",Kind="View"]}[Data], -- 7 more transformation steps (type changes, filters, computed columns)... #"Replaced Value" = Table.ReplaceValue(...) in #"Replaced Value"
What needs to be extracted
Snowflake.Databases(...) {[Name="PROD201_DB_redacted"]} {[Name="CONSUMPTION"]} {[Name="details"]} Resolved upstream: PROD201_DB_redacted.CONSUMPTION.details (table-level lineage)
GSP Result (Power Query parser)
GSP Power Query parser (dbvpowerquery):
Navigation resolved:
Step: Details_View | Vendor: dbvsnowflake
Path: PROD201_DB_redacted.CONSUMPTION.details
Synthetic SQL: SELECT * FROM "PROD201_DB_redacted"."CONSUMPTION"."details" Navigation to Snowflake table + column renaming & business logic
Pattern A · Snowflake · Table.RenameColumns · Table.AddColumn · conditional logicOriginal M-Language
let Source = Snowflake.Databases("redacted.snowflakecomputing.com", #"Warehouse name"), my_Database = Source{[Name="PROD201_DB_redacted",Kind="Database"]}[Data], CONSUMPTION_Schema = my_Database{[Name="CONSUMPTION",Kind="Schema"]}[Data], MY_View = CONSUMPTION_Schema{[Name="MESSAGES",Kind="Table"]}[Data], #"Renamed Columns" = Table.RenameColumns(MY_View, { {"DATE", "Date"}, {"CHANNEL", "Channel"}, {"BOOKINGID", "Booking ID"}, ... 14 more ...}), -- 2 more steps with conditional logic (if/then/else)... #"Added Custom1" = Table.AddColumn(...) in #"Added Custom1"
What needs to be extracted
Snowflake.Databases(...) {[Name="PROD201_DB_redacted"]} {[Name="CONSUMPTION"]} {[Name="MESSAGES"]} Resolved upstream: PROD201_DB_redacted.CONSUMPTION.MESSAGES (table-level lineage)
GSP Result (Power Query parser)
GSP Power Query parser (dbvpowerquery):
Navigation resolved:
Step: MY_View | Vendor: dbvsnowflake
Path: PROD201_DB_redacted.CONSUMPTION.MESSAGES
Synthetic SQL: SELECT * FROM "PROD201_DB_redacted"."CONSUMPTION"."MESSAGES"
The Table.RenameColumns and Table.AddColumn
steps are M-level transformations that don't change the upstream source.
All 4 queries at a glance.
| Query | Pattern | Upstream Table | DataHub Today | With Sidecar | Status |
|---|---|---|---|---|---|
| Q1 | A M Navigation | PROD201_DB_redacted.CONSUMPTION.details | 0 lineage | Table-level | Solved |
| Q2 | B NativeQuery | PROD201_DB_redacted.CONSUMPTION.CV_OFFER_redacted_CITY | 0 lineage | 1 table + 1 column (*) | Solved |
| Q3 | A M Navigation | PROD201_DB_redacted.CONSUMPTION.MESSAGES | 0 lineage | Table-level | Solved |
| Q4 | B NativeQuery | ...DATAMARTS.SALES_MARKETING.CV_COMMDX_SALES_TRANSFERS | 0 lineage | 1 table + 6 columns | Solved |
Both patterns solved. 3 steps.
Extract SQL from M
Identify Value.NativeQuery() calls in the M-language.
Extract the SQL string. Decode #(lf), #(cr),
#(tab), and "" escapes.
Parse with GSP
Feed the M-language script to GSP SQLFlow as dbvpowerquery.
GSP extracts embedded SQL, infers the dialect, and returns full column-level lineage.
gsp-datahub-sidecar --db-vendor dbvpowerquery --dry-run Emit to DataHub
The sidecar builds UpstreamLineageClass MCPs with
table-level and column-level relationships, and emits them to DataHub GMS.
Common questions.
What is M-language?
M (officially "Power Query Formula Language") is the data transformation language behind Power BI's "Get Data" and "Transform Data" features. Every time a user clicks through the Power Query Editor GUI, Power BI generates M code behind the scenes. It's a functional language — closer to F# than SQL.
Why does DataHub fail on all 4 queries?
DataHub's ingestion pipeline expects SQL. M-language is not SQL — it's a completely different language with different syntax, semantics, and data model. DataHub's sqlglot parser doesn't attempt to parse M, so all 4 queries produce zero lineage regardless of whether they contain embedded SQL or not.
How does GSP solve Pattern A (no SQL)?
GSP's Power Query module (dbvpowerquery) includes a lightweight M-language parser that walks navigation chains ({[Name="X", Kind="Y"]}[Data]). It identifies the connector function (e.g. Snowflake.Databases), then resolves each navigation step (Database → Schema → Table/View) to build the full database.schema.table path. It then generates a synthetic SELECT * FROM "db"."schema"."table" for lineage emission. No SQL parser can do this — it requires M-language-specific parsing.
How does GSP solve Pattern B (embedded SQL)?
GSP detects Value.NativeQuery(source, "sql") calls, extracts the SQL string, decodes M-specific escapes (#(lf) → newline, "" → "), infers the SQL dialect from the inline connector call (e.g. Snowflake.Databases → Snowflake), and delegates to the appropriate SQL parser for full column-level lineage.
What about #(lf) in the SQL string?
#(lf) is M-language's escape for a line-feed character. When SQL is embedded inside M strings, real newlines are replaced with #(lf). This breaks SQL comments (-- runs to end-of-string instead of end-of-line) and confuses most parsers. The sidecar decodes these escapes before parsing. See issue #11251 for the full comment-stripping problem.
What about "" (double-double-quote) in SQL?
M-language uses "" to represent a literal double-quote inside a string. In Query 4, ""Service Date"" becomes "Service Date" in the extracted SQL — a quoted Snowflake identifier. The sidecar handles this decoding automatically.
Which connectors will Pattern A support?
The M-language extractor will support: Snowflake.Databases, Sql.Database / Sql.Databases (SQL Server), Oracle.Database, PostgreSQL.Database, MySQL.Database, and GoogleBigQuery.Database / BigQuery.Database. New connectors can be added via a simple catalog entry.
Is my SQL sent to a third party?
Depends on the backend. Anonymous and Authenticated modes parse SQL in Gudu Software's cloud (processed in memory, not stored). Self-hosted mode keeps everything on your infrastructure. See deployment options.
Recover your Power BI lineage.
Both patterns solved. Install the sidecar to recover lineage from
Value.NativeQuery() embedded SQL and M-language navigation chains.
Open source on GitHub · Apache 2.0 license