Enterprise Lineage

Driving Data Lineage across the Enterprise

Posted by steve on June 21, 2020
Implementing Enterprise Lineage | Data Lineage | Pathwise Complexity

In Data Lineage we highlighted that data lineage can be captured using UML <<trace>> relationships and field-level lineage can be projected from trace + common attribute names. This blog is concerned with using those relationships to automatically derive systems lineage.

Application Landscape

A core deliverable for any mature Enterprise Architecture practice is the application landscape that shows applications within the content of the platforms and layers that they are built upon, clustered so that collaborators appear next to each other. Applications are connected by <<Flow>> relationships that highlight the data that is passed between them. Often only the high-level flows are used, but the detail flows inform the layout of the diagram

Service/Message/Event Buses

A spider-web of flows between systems (where the number of flows between application is a factorial of the number of applications) is often used to demonstrate the complexity of the Current-State compared to the simplicity of a service/message bus in the target state. Services-Buses compound the difficulty of mapping lineage because it becomes common to highlight input/output as message and presume that interpretation of data is usage problem – this is erroneous:

  • Generalising at the landscape, pushes the actual architecture work down to the applications and converts the landscape from a diagram to picture.
  • Over generalisation mitigates against analysis of availability or exception escalation.
  • Automatic lineage projects this as a complex dependency graph where every system is potentially dependant on every other system. This is not a deficiency of the lineage graph, but highlights the need to be more specific.

Event Buses

Quality Event buses derive from the market-data pub/sub pattern rather than the message-queue pattern because they are time-ordered and newer events always supersede prior events. In lineage terms events busses appear to extenuate the service bus complexity problem, but can highlight design faults because the presence of event-lineage in an externally facing interface highlights data-leakage rather than complexity – fixing the leakage fault removes the complexity.

Deriving Enterprise Lineage

In object oriented analysis and design it is axiomatic that if [A] inherits from [B] and [B] inherits from [C] then [A] also inherits from [C] (even if [B] completely replaces the behaviour of [C]). Appling inheritance semantics to data trace [A] <- [B] <- [C] implies [A] <- [C]. In a trading scenario it is common to take the closing-price of one geographical market, and never use it because the opening-price of the next geographic market is available before we use it, but in 1% of cases (holidays/disasters) an opening price is not available and the prior closing-price is used.

The diagram show how enterprise lineage can be inferred from trace relationships. Although Reporting consumes Fact from the data-warehouse, FI Trade is inferred to be in the Fact lineage because Fact has a trace relationship t FI trade. It will be demonstrated that the enterprise lineage can be automatically recursive derived from the flow relationships and the data trace relationships. With enterprise lineage, lineage does not need to be confined to components but can be extended to Actors (Bloomberg in the example); code classes/interfaces and even use-cases and business process for manual sourcing. Examining the lineage at the Reporting component boundary highlights (without consideration of internal systems complexity that Bloomberg contributes to Price and that Fact includes an element of manual data-entry. From a governance/regulatory perspective, it is not necessary to see the detailed working to know that price-sourcing and operational-risk need to be considered. Demonstration that lineage is automatically derived from detailed flows means that a detailed audit is not required to confirm the veracity of lineage summaries.

Price lineage generated automatically from above diagram
["SimpleBank.Component.Reporting", 
    [
        ["SimpleBank.Component.Data Warehouse",
            [
                ["SimpleBank.Component.Message Bus", 
                    [
                        ["SimpleBank.Component.MarketData",
                            ["SimpleBank.Class Diagram.MarketData.Bloomberg"]
                        ]
                    ]
                ],
                "SimpleBank.Component.FI Trading", 
                "SimpleBank.Component.EQ Trading"
            ]
        ]
    ]
]
Fact lineage generated automatically from above diagram
["SimpleBank.Component.Reporting", 
    [   
        ["SimpleBank.Component.Data Warehouse", 
            [   
                ["SimpleBank.Component.Message Bus", 
                    ["SimpleBank.Component.FX Trading"]],
                ["SimpleBank.Component.FI Trading", 
                    [
                        ["SimpleBank.Use Case.Execute Swap OTC", 
                            ["SimpleBank.Process.Execute Trade"]]
                    ]
                ], 
                "SimpleBank.Component.EQ Trading"
            ]
        ]
    ]
]

Conclusion

An alternative approach to data-lineage highlighted that <<trace>> relationships are superior when your objective is to provide lineage analysis rather than design. Using trace for data-lineage is an enabler for enterprise lineage. Enterprise Data-Lineage is well suited to enterprise architecture because it separates the solution domain from the enterprise domain where lineage analysis becomes an additional part architecture review and feedback to solutions architects and designers is addressed through improvement proceses. The next article will show how lineage is generated automatically through an Enterprise Hub