Abstract:
Techniques for storing data entities by a data processing system are described herein. The data processing system may store a plurality of data entity instances generated using a plurality of data entities. The plurality of data entity instances may include a first data entity instance generated using a first data entity and a second data entity instance generated using a second data entity. The first data entity instance may include a first attribute that is configured to inherit its value from a second attribute of the second data entity instance. The data processing system may provide the inherited value of the second attribute of the second data entity instance as the value of the first attribute of the first data entity instance.
Abstract:
Managing lineage information includes processing a request for a representation of data lineage for a first node of a number of nodes (102, 104, 106). The processing includes determining an association between the first node and at least a first tag identifier of a number of tag identifiers, and determining a first subset of at least one and fewer than all of a number of possible tag values for the first tag identifier, and traversing nodes along a first lineage path of directed links from the first node to determine a data lineage for the first node. Determining the data lineage includes, for each traversed node (350) determining whether to add (356) the traversed node to the data lineage or to exclude (360) the traversed node from the data lineage based at least in part on any tag identifiers or tag values associated with the traversed node.
Abstract:
Techniques for storing data entities by a data processing system are described herein. The data processing system may store a plurality of data entity instances generated using a plurality of data entities. The plurality of data entity instances may include a first data entity instance generated using a first data entity and a second data entity instance generated using a second data entity. The first data entity instance may include a first attribute that is configured to inherit its value from a second attribute of the second data entity instance. The data processing system may provide the inherited value of the second attribute of the second data entity instance as the value of the first attribute of the first data entity instance.
Abstract:
A data processing system configured to perform: obtaining a first data lineage representing relationships among physical data elements, the first data lineage being generated at least in part by performing at least one of: (a) analyzing source code of at least one computer program configured to access the physical data elements; and (b) analyzing information obtained during runtime of the at least one computer program; obtaining, based on user input, a second data lineage representing relationships among business data elements; obtaining an association between at least some of the physical data elements of the first data lineage and at least some of the business data elements of the second data lineage; and generating, based on the association between the physical data elements and the business data elements, an indication of agreement or discrepancy between the first data lineage and the second data lineage.
Abstract:
Techniques for using finite state machines (FSMs) to implement workflows in a data processing system comprising at least one data store storing data objects and a workflow management system (WMS). The WMS is configured to perform: determining a current value of an attribute of a first data object by accessing the current value in the at least one data store; identifying, using the current value and metadata specifying relationships among at least some of the data objects, an actor authorized to perform a workflow task for the first data object; generating a GUI through which the actor can provide the input that the workflow task is to be performed; and in response to receiving, from the actor and through the GUI, input specifying that the workflow task is to be performed: performing the workflow task; and updating the current workflow state of the first FSM to a second workflow state.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for metadata management. One of the methods includes receiving user input selecting a first node. The method includes receiving a first data lineage of a first object, the first object having a type, the first data lineage describing relationships between the first object and one or more datasets or transforms. The method includes receiving user input selecting a second node. The method includes receiving a second data lineage of a second object, the second object having the same type as the first object. The method includes performing a comparison of the first node and the first data lineage to the second node and the second data lineage. The method includes generating a report based on the comparison.
Abstract:
An electronic system for increasing the speed of preparing data with a specified data quality for storage by automatically identifying for a user, with minimal user input, common contexts among (i) fields in disparate datasets, and (ii) names the user has specified as potentially describing the fields, and by using those common contexts to govern the disparate datasets prior to storage to ensure the specified data quality.
Abstract:
Among other things, we describe a method of receiving a portion of metadata from a data source, the portion of metadata describing nodes and edges; generating instances of a data structure representing the portion of metadata, at least one instance of the data structure including an identification value that identifies a corresponding node, one or more property values representing respective properties of the corresponding node, and one or more pointers to respective identification values, each pointer representing an edge associated with a node identified by the corresponding respective identification value; storing the instances of the data structure in random access memory; receiving a query that includes an identification of at least one particular element of data; and using at least one instance of the data structure to cause a display of a computer system to display a representation of lineage of the particular element of data.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for building reports. One of the methods includes creating a model based on relational structured data, the structured data including data structures, each data structure having data elements, each data element having fields, each field having a name. The method includes generating a hierarchy of objects in model, the hierarchy organizing objects the with respect to a starting object according to relationship fields on the objects. The method includes generating a user interface including elements for one or more of the objects in the hierarchy, wherein the user interface enables a user to create a report and filter the report using the new name. The method includes receiving a user selection of an element from the elements. The method also includes generating a report.
Abstract:
Managing lineage information includes processing a specification of a directed graph to associate nodes (102, 104, 106) with information for processing requests for a representation of data lineage. The processing includes: identifying a first set of one or more nodes (1362, 1364, 1366) of the directed graph corresponding to normalizing data elements being stored in a data store and de-normalizing data elements being retrieved from the data store; and associating a first plurality of nodes (1370, 1372, 1374) connected to the first set of one or more nodes and a second plurality of nodes (1376, 1378, 1380) connected to the first set of one or more nodes with at least one tag identifier having a plurality of possible tag values, where the number of possible tag values is at least as large as the number of data elements being normalized, and where nodes representing different data elements in a de-normalized record are associated with different values of the tag identifier.