Determining Impact and Data Lineage

Determining the impact and data lineage for database objects such as stored procedure, table, view, column can help you troubleshoot stored procedure; capture and represent column level data lineage between applications; find table/view and column references and more.

The following are some examples:
* A stored procedure is running slowly. You examine the data lineage for the stored procedure. Looking at the data lineage helps you determine that the stored procedure queries a table that lacks useful indexes.

* A stored procedure does not run. You examine which objects have an impact on that stored procedure. Looking at the object impact helps you determine that the stored procedure exports data to a table column that has been deleted.

Understanding the impact and data lineage for objects can also help you understand the possible ramifications of making changes to objects. For example, you plan to modify a table. You examine the impact of that table on your stored procedure and discover that the stored procedure uses the table to extract sales data. If you modify the table columns, the stored procedure might fail.

Typically, there are two types of impact and data lineage.


1. impact lineage

Object B has an impact on object A. Changes to object B can cause the behavior of object A to change or cause object A to fail. we call this impact lineage, such kind of lineage can be found but not limited in where clause/join condition/group by/order by clause in a SQL query.
SELECT a.deptno                  "Department",
       a.num_emp / b.total_count "Employees",
       a.sal_sum / b.total_sal   "Salary"
FROM   (SELECT deptno,
               COUNT(*) num_emp,
               SUM(sal) sal_sum
        FROM   scott.emp
        GROUP  BY deptno) a,
       (SELECT COUNT(*) total_count,
               SUM(sal) total_sal
        FROM   scott.emp) b 

in above SQL, "Employees" depends on: scott.emp, scott.emp.deptno. If we change data in scott.emp.deptno, or change definition of scott.emp.deptno, selected result of "Employees" will be changed.


2: data/persistence lineage

Object A provides data to object C. The lineage of data is from object A to object C.
we call this data/persistence lineage. such kind of lineage can be found but not limited in create view, set clause in update statement and insert statement.
UPDATE sds_tst_prdct
SET    base_cow_id = b.bsecowid
FROM   sds_tst_prdct a,
       bog_pr_jil b
WHERE  a.std_tst_id = b.tstid
       AND b.bsecowid > 1 

in above SQL, bog_pr_jil.bsecowid provides data to sds_tst_prdct.base_cow_id, so there is a data lineage from bog_pr_jil to sds_tst_prdct.

There in no builtin tool in major databases such as Oracle, SQL Server, Sybase, Teradata and DB2 can trace impact and data lineage for you. So we created some tools based on our powerful SQL Parser which can provides an in-depth and detailed analysis of SQL scripts for various databases.

our tool read some SQL scripts and generate an impact and data lineage map automatically.
check this demo for detailed information.

Supported Database

 

Google BigQuery

 

Couchbase database

 

IBM DB2 database

 

EMC Greenplum

 

SAP Hana

 

hadoop hive

 

Impala

 

IBM Informix database

 

MySQL database

 

Netezza database

 

Openedge database

 

Oracle database

 

PostgreSQL database

 

Amazon redshift

 

Snowflake SQL

 

Spark SQL

 

Microsoft SQL Server database

 

Sybase database

 

Teradata database

 

Vertica database

 

Microsfot Office Access database