DataHub issue #15327 · deep analysis · real lineage results

4 Power BI queries.
0 upstream lineage.
Here's exactly why.

DataHub issue #15327 reports 4 M-language queries connecting to Snowflake — all producing zero upstream lineage. We analyzed every query, identified two distinct failure patterns, and ran real lineage extraction against each one.

See the analysis ↓ Fix it now →

Upstream tables found
by DataHub today

Upstream tables
recoverable

Column-level lineages
recovered (Q4 alone)

Technologies involved

DataHub Power BI Snowflake M-Language

The problem

Power BI queries live in M-language. DataHub only sees SQL.

When Power BI connects to Snowflake, it generates M-language (Power Query Formula Language) — not SQL. DataHub's ingestion pipeline expects SQL and chokes on M's functional syntax.

The result: zero upstream lineage for every Power BI dataset. No tables. No columns. No relationships. The lineage graph simply stops at the Power BI boundary.

Issue #15327 documents 4 real-world queries. All 4 fail.

-- Power BI M-Language (not SQL!)
let
  Source = Snowflake.Databases(...),
  DB = Source{[Name="PROD201_DB"]}[Data],
  Schema = DB{[Name="CONSUMPTION"]}[Data],
  Table = Schema{[Name="details"]}[Data]
in
  Table

DataHub result: 0 upstream tables — M-language is not SQL, so no parser can extract lineage from it

The analysis

Two patterns. Two different problems.

The 4 queries in issue #15327 fall into two fundamentally different categories. Understanding the distinction is key to fixing them.

Pattern A

Pure M-Language Navigation

Queries 1 & 3

The table reference is encoded in M's record-access syntax: {[Name="X", Kind="Database"]}[Data]. There is no SQL anywhere in the query.

Source{[Name="PROD201_DB",Kind="Database"]}[Data]

Solved by GSP Power Query parser

Pattern B

Value.NativeQuery() with Embedded SQL

Queries 2 & 4

M wraps actual SQL inside a Value.NativeQuery() call. The SQL contains #(lf) escapes and "" double-quote encoding that must be decoded first.

Value.NativeQuery(source, "SELECT ...#(lf)FROM ...")

Solved by GSP Power Query parser

Pattern B — solved today

Queries 2 & 4: Embedded SQL inside M-language

These queries wrap real Snowflake SQL inside Value.NativeQuery(). The sidecar extracts the SQL, decodes #(lf) escapes, and feeds it to GSP SQLFlow for full column-level lineage.

Simple SELECT with `#(lf)` escape

Pattern B · Snowflake · SELECT *

Original M-Language

let
    Source = Value.NativeQuery(
        Snowflake.Databases(SnowFlakeServer, SnowFlakeDWH)
            {[Name="PROD201_redacted"]}[Data],
        "SELECT *#(lf)FROM PROD201_DB_redacted.CONSUMPTION.CV_OFFER_redacted_CITY",
        null, [EnableFolding=true])
in
    Source

Step 1: Decode M-language escapes

The SQL string "SELECT *#(lf)FROM ..." contains #(lf), M-language's escape for a line-feed character. Decoding it produces valid SQL:

SELECT *
FROM PROD201_DB_redacted.CONSUMPTION.CV_OFFER_redacted_CITY

Step 2: Analyze the SQL

SELECT * means every column from the source table flows through to the Power BI dataset. GSP represents this as a wildcard mapping: * → *. This is semantically correct — the exact column set depends on the source table's schema, which may change over time.

Sidecar Result (authenticated mode, real API call)

✓ 1 table-level + 1 column-level lineage (* → *)

PROD201_DB_REDACTED.CONSUMPTION.CV_OFFER_REDACTED_CITY → Power BI Dataset

SELECT * produces a wildcard column mapping (* → *), meaning all columns from the source flow to the target.

$ gsp-datahub-sidecar --mode authenticated --db-vendor dbvpowerquery --dry-run
[INFO] Extracted 1 table-level lineage relationships
[INFO]   PROD201_DB_REDACTED.CONSUMPTION.CV_OFFER_REDACTED_CITY
         --> POWERBI_DATASET.Q2_OFFER_CITY (1 columns)
[INFO] Table-level lineages found:      1
[INFO] Column-level mappings extracted: 1
[INFO] Built 1 MCPs for 1 downstream tables (1 column-level mappings)
  * -> *

Complex Snowflake SQL with GROUP BY ALL, quoted aliases

Pattern B · Snowflake · GROUP BY ALL · HAVING · expressions

Original M-Language (abbreviated)

let
    Source = Value.NativeQuery(
        Snowflake.Databases("redacted.snowflakecomputing.com",
            WAREHOUSE, [Implementation="2.0"])
            {[Name="PROD201_DB_redacted"]}[Data],
        "
SELECT
  SERVICE_START_DATE AS ""Service Date"",
  TO_NUMBER(OPERATIVE_OFFICE) || '_' || statement || ...
  VENDOR_CODE
FROM PROD201_DB_redacted_DATAMARTS.SALES_MARKETING
     .CV_COMMDX_SALES_TRANSFERS
WHERE SERVICE_START_DATE >= '2023-10-01' ...
GROUP BY ALL
HAVING SUM(COST_OF_SALES_CUR) = 0 AND COUNT(*) > 1
        ", null, [EnableFolding=true]),
    #"TimeFilter" = Table.SelectRows(...)
in
    TimeFilter

"" is M's escape for double-quote inside strings → becomes "Service Date" in SQL.
GROUP BY ALL is Snowflake-specific syntax.

Step 1: Decode M-language escapes

M uses "" (doubled double-quote) to encode a literal " inside strings. The alias ""Service Date"" in M becomes "Service Date" — a valid Snowflake quoted identifier.

SELECT
  SERVICE_START_DATE AS "Service Date",
  TO_NUMBER(OPERATIVE_OFFICE) || '_' || statement
    || '_' || VENDOR_CODE || '_'
    || TO_CHAR(SERVICE_START_DATE, 'YYYYMMDD')
    AS OFFICE_STATEMENT_VENDOR_DATE,
  VENDOR_CODE
FROM PROD201_DB_redacted_DATAMARTS.SALES_MARKETING
     .CV_COMMDX_SALES_TRANSFERS
WHERE SERVICE_START_DATE >= '2023-10-01'
  AND PRODUCT_TYPE IN ('TRANSFER', 'EXP & REC')
  AND OPERATIVE_OFFICE >= 00005000
  AND VENDOR_CODE IS NOT NULL
  AND STATEMENT IS NOT NULL
GROUP BY ALL
HAVING SUM(COST_OF_SALES_CUR) = 0
   AND COUNT(*) > 1

Step 2: Analyze the SQL

This query has 3 SELECT columns. Column 1 is a simple alias. Column 2 is a complex concatenation expression that references 4 source columns: OPERATIVE_OFFICE (via TO_NUMBER), statement, VENDOR_CODE, and SERVICE_START_DATE (via TO_CHAR). Column 3 is a direct pass-through. GSP traces through TO_NUMBER() and TO_CHAR() functions to the underlying columns, producing 6 column-level lineage mappings. Snowflake's GROUP BY ALL and HAVING do not affect column lineage — they only filter rows.

Sidecar Result (authenticated mode, real API call)

✓ 1 table-level + 6 column-level lineages

PROD201_DB_REDACTED_DATAMARTS.SALES_MARKETING.CV_COMMDX_SALES_TRANSFERS → Power BI Dataset

Column-level lineage (6 mappings):

Source Column		Target Column	Why
SERVICE_START_DATE	→	SERVICE DATE	Direct alias: `AS "Service Date"`
OPERATIVE_OFFICE	→	OFFICE_STATEMENT_VENDOR_DATE	Via `TO_NUMBER(OPERATIVE_OFFICE)` in concat
STATEMENT	→	OFFICE_STATEMENT_VENDOR_DATE	Used directly in `\|\| statement \|\|` concat
VENDOR_CODE	→	OFFICE_STATEMENT_VENDOR_DATE	Used directly in `\|\| VENDOR_CODE \|\|` concat
SERVICE_START_DATE	→	OFFICE_STATEMENT_VENDOR_DATE	Via `TO_CHAR(SERVICE_START_DATE, 'YYYYMMDD')`
VENDOR_CODE	→	VENDOR_CODE	Direct pass-through (3rd SELECT column)

$ gsp-datahub-sidecar --mode authenticated --db-vendor dbvpowerquery --dry-run
[INFO] Extracted 1 table-level lineage relationships
[INFO]   PROD201_DB_REDACTED_DATAMARTS.SALES_MARKETING
         .CV_COMMDX_SALES_TRANSFERS
         --> POWERBI_DATASET.Q4_SALES_TRANSFERS (6 columns)
[INFO] Table-level lineages found:      1
[INFO] Column-level mappings extracted: 6
[INFO] Built 3 MCPs for 1 downstream tables (6 column-level mappings)
  service_start_date -> service date          # col 1: direct alias
  operative_office -> office_statement_vendor_date  # col 2: TO_NUMBER() in concat
  statement -> office_statement_vendor_date         # col 2: used in concat
  vendor_code -> office_statement_vendor_date       # col 2: used in concat
  service_start_date -> office_statement_vendor_date # col 2: TO_CHAR() in concat
  vendor_code -> vendor_code                        # col 3: pass-through

Pattern A — solved

Queries 1 & 3: Pure M-language navigation

These queries use M-language's record-access syntax to navigate to Snowflake objects. There is no SQL in the query — the table reference is encoded entirely in M's functional expressions. GSP's M-language parser walks the navigation chain to extract the full database.schema.table path.

Navigation to Snowflake view + column transformations

Pattern A · Snowflake · Table.TransformColumnTypes · Table.AddColumn

Original M-Language

let
    Source = Snowflake.Databases(SnowFlakeConnector, SnowflakeWarehouse),
    my_Database = Source{[Name="PROD201_DB_redacted",Kind="Database"]}[Data],
    CONSUMPTION_Schema = my_Database{[Name="CONSUMPTION",Kind="Schema"]}[Data],
    Details_View = CONSUMPTION_Schema{[Name="details",Kind="View"]}[Data],
    -- 7 more transformation steps (type changes, filters, computed columns)...
    #"Replaced Value" = Table.ReplaceValue(...)
in
    #"Replaced Value"

What needs to be extracted

Connector Snowflake.Databases(...)

→

Database {[Name="PROD201_DB_redacted"]}

→

Schema {[Name="CONSUMPTION"]}

→

View {[Name="details"]}

Resolved upstream: PROD201_DB_redacted.CONSUMPTION.details (table-level lineage)

GSP Result (Power Query parser)

✓ Table-level lineage recovered via navigation chain parsing

PROD201_DB_redacted.CONSUMPTION.details → Power BI Dataset

GSP Power Query parser (dbvpowerquery):
  Navigation resolved:
    Step: Details_View | Vendor: dbvsnowflake
    Path: PROD201_DB_redacted.CONSUMPTION.details
    Synthetic SQL: SELECT * FROM "PROD201_DB_redacted"."CONSUMPTION"."details"

Navigation to Snowflake table + column renaming & business logic

Pattern A · Snowflake · Table.RenameColumns · Table.AddColumn · conditional logic

Original M-Language

let
    Source = Snowflake.Databases("redacted.snowflakecomputing.com",
        #"Warehouse name"),
    my_Database = Source{[Name="PROD201_DB_redacted",Kind="Database"]}[Data],
    CONSUMPTION_Schema = my_Database{[Name="CONSUMPTION",Kind="Schema"]}[Data],
    MY_View = CONSUMPTION_Schema{[Name="MESSAGES",Kind="Table"]}[Data],
    #"Renamed Columns" = Table.RenameColumns(MY_View, {
        {"DATE", "Date"}, {"CHANNEL", "Channel"},
        {"BOOKINGID", "Booking ID"}, ... 14 more ...}),
    -- 2 more steps with conditional logic (if/then/else)...
    #"Added Custom1" = Table.AddColumn(...)
in
    #"Added Custom1"

What needs to be extracted

Connector Snowflake.Databases(...)

→

Database {[Name="PROD201_DB_redacted"]}

→

Schema {[Name="CONSUMPTION"]}

→

Table {[Name="MESSAGES"]}

Resolved upstream: PROD201_DB_redacted.CONSUMPTION.MESSAGES (table-level lineage)

GSP Result (Power Query parser)

✓ Table-level lineage recovered via navigation chain parsing

PROD201_DB_redacted.CONSUMPTION.MESSAGES → Power BI Dataset

GSP Power Query parser (dbvpowerquery):
  Navigation resolved:
    Step: MY_View | Vendor: dbvsnowflake
    Path: PROD201_DB_redacted.CONSUMPTION.MESSAGES
    Synthetic SQL: SELECT * FROM "PROD201_DB_redacted"."CONSUMPTION"."MESSAGES"

The Table.RenameColumns and Table.AddColumn steps are M-level transformations that don't change the upstream source.

Summary

All 4 queries at a glance.

Query	Pattern	Upstream Table	DataHub Today	With Sidecar	Status
Q1	A M Navigation	`PROD201_DB_redacted.CONSUMPTION.details`	0 lineage	Table-level	Solved
Q2	B NativeQuery	`PROD201_DB_redacted.CONSUMPTION.CV_OFFER_redacted_CITY`	0 lineage	1 table + 1 column (*)	Solved
Q3	A M Navigation	`PROD201_DB_redacted.CONSUMPTION.MESSAGES`	0 lineage	Table-level	Solved
Q4	B NativeQuery	`...DATAMARTS.SALES_MARKETING.CV_COMMDX_SALES_TRANSFERS`	0 lineage	1 table + 6 columns	Solved

How it works

Both patterns solved. 3 steps.

Extract SQL from M

Identify Value.NativeQuery() calls in the M-language. Extract the SQL string. Decode #(lf), #(cr), #(tab), and "" escapes.

Parse with GSP

Feed the M-language script to GSP SQLFlow as dbvpowerquery. GSP extracts embedded SQL, infers the dialect, and returns full column-level lineage.

gsp-datahub-sidecar --db-vendor dbvpowerquery --dry-run

Emit to DataHub

The sidecar builds UpstreamLineageClass MCPs with table-level and column-level relationships, and emits them to DataHub GMS.

FAQ

Common questions.

What is M-language?

M (officially "Power Query Formula Language") is the data transformation language behind Power BI's "Get Data" and "Transform Data" features. Every time a user clicks through the Power Query Editor GUI, Power BI generates M code behind the scenes. It's a functional language — closer to F# than SQL.

Why does DataHub fail on all 4 queries?

DataHub's ingestion pipeline expects SQL. M-language is not SQL — it's a completely different language with different syntax, semantics, and data model. DataHub's sqlglot parser doesn't attempt to parse M, so all 4 queries produce zero lineage regardless of whether they contain embedded SQL or not.

How does GSP solve Pattern A (no SQL)?

GSP's Power Query module (dbvpowerquery) includes a lightweight M-language parser that walks navigation chains ({[Name="X", Kind="Y"]}[Data]). It identifies the connector function (e.g. Snowflake.Databases), then resolves each navigation step (Database → Schema → Table/View) to build the full database.schema.table path. It then generates a synthetic SELECT * FROM "db"."schema"."table" for lineage emission. No SQL parser can do this — it requires M-language-specific parsing.

How does GSP solve Pattern B (embedded SQL)?

GSP detects Value.NativeQuery(source, "sql") calls, extracts the SQL string, decodes M-specific escapes (#(lf) → newline, "" → "), infers the SQL dialect from the inline connector call (e.g. Snowflake.Databases → Snowflake), and delegates to the appropriate SQL parser for full column-level lineage.

What about #(lf) in the SQL string?

#(lf) is M-language's escape for a line-feed character. When SQL is embedded inside M strings, real newlines are replaced with #(lf). This breaks SQL comments (-- runs to end-of-string instead of end-of-line) and confuses most parsers. The sidecar decodes these escapes before parsing. See issue #11251 for the full comment-stripping problem.

What about "" (double-double-quote) in SQL?

M-language uses "" to represent a literal double-quote inside a string. In Query 4, ""Service Date"" becomes "Service Date" in the extracted SQL — a quoted Snowflake identifier. The sidecar handles this decoding automatically.

Which connectors will Pattern A support?

The M-language extractor will support: Snowflake.Databases, Sql.Database / Sql.Databases (SQL Server), Oracle.Database, PostgreSQL.Database, MySQL.Database, and GoogleBigQuery.Database / BigQuery.Database. New connectors can be added via a simple catalog entry.

Is my SQL sent to a third party?

Depends on the backend. Anonymous and Authenticated modes parse SQL in Gudu Software's cloud (processed in memory, not stored). Self-hosted mode keeps everything on your infrastructure. See deployment options.

Recover your Power BI lineage.

Both patterns solved. Install the sidecar to recover lineage from Value.NativeQuery() embedded SQL and M-language navigation chains.

$ pip install git+https://github.com/gudusoftware/gsp-datahub-sidecar.git Copy

Get API key → Book a demo →

Open source on GitHub · Apache 2.0 license

4 Power BI queries.0 upstream lineage.Here's exactly why.

Power BI queries live in M-language. DataHub only sees SQL.

Two patterns. Two different problems.

Pure M-Language Navigation

Value.NativeQuery() with Embedded SQL

Queries 2 & 4: Embedded SQL inside M-language

Simple SELECT with #(lf) escape

Original M-Language

Step 1: Decode M-language escapes

Step 2: Analyze the SQL

Sidecar Result (authenticated mode, real API call)

Complex Snowflake SQL with GROUP BY ALL, quoted aliases

Original M-Language (abbreviated)

Step 1: Decode M-language escapes

Step 2: Analyze the SQL

Sidecar Result (authenticated mode, real API call)

Column-level lineage (6 mappings):

Queries 1 & 3: Pure M-language navigation

Navigation to Snowflake view + column transformations

Original M-Language

What needs to be extracted

GSP Result (Power Query parser)

Navigation to Snowflake table + column renaming & business logic

Original M-Language

What needs to be extracted

GSP Result (Power Query parser)

All 4 queries at a glance.

Both patterns solved. 3 steps.

Extract SQL from M

Parse with GSP

Emit to DataHub

Common questions.

Part of a bigger picture.

Recover your Power BI lineage.

4 Power BI queries.
0 upstream lineage.
Here's exactly why.

Simple SELECT with `#(lf)` escape