-
1.
公开(公告)号:US20230418792A1
公开(公告)日:2023-12-28
申请号:US17851546
申请日:2022-06-28
Applicant: HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
Inventor: Annmary Justine KOOMTHANAM , Suparna Bhattacharya , Aalap Tripathy , Sergey Serebryakov , Martin Foltin , Paolo Faraboschi
IPC: G06F16/215 , G06F16/25 , G06F16/27 , G06N20/00 , G06K9/62
CPC classification number: G06F16/215 , G06F16/254 , G06F16/27 , G06N20/00 , G06K9/6256
Abstract: Systems and methods are provide for automatically constructing data lineage representations for distributed data processing pipelines. These data lineage representations (which are constructed and stored in a central repository shared by the multiple data processing sites) can be used to among other things, clone the distributed data processing pipeline for quality assurance or debugging purposes. Examples of the presently disclosed technology are able to construct data lineage representations for distributed data processing pipelines by (1) generating a hash content value for universally identifying each data artifact of the distributed data processing pipeline across the multiple processing stages/processing sites of the distributed data processing pipeline; and (2) creating an data processing pipeline abstraction hierarchy for associating each data artifact to input and output events for given executions of given data processing stages (performed by the multiple data processing sites).