Abstract:
In various embodiments, a data integration system is disclosed which enables users to create a logical design which is platform and technology independent. The user can create a logical design that defines, at a high level, how a user wants data to flow between sources and targets. The tool can analyze the logical design, in view of the user's infrastructure, and create a physical design. The logical design can include a plurality of components corresponding to each source and target in the design, as well as operations such as joins or filters, and access points. Each component when transferred to the physical design generates code to perform operations on the data. Depending on the underlying technology (e.g., SQL Server, Oracle, Hadoop, etc.) and the language used (SQL, pig, etc.) the code generated by each component may be different.
Abstract:
The present disclosure relates to a system and techniques for resolving dangling references resulting from a dependency relationship between computing resource objects uncovered during a harvesting process. The techniques include, adding a computing resource object from a catalog of computing resource objects to a computing resource collection for a client and identifying one or more dependencies for the computing resource object. The techniques further include determining at least one unresolved dependency from the one or more dependencies, the at least one unresolved dependency including a second dependency on a second computing resource object outside of the computing resource collection. The techniques further include resolving the at least one unresolved dependency after the second computing resource object associated with the unresolved dependency has been added to the computing resource collection.
Abstract:
In accordance with various embodiments, described herein is a system (Data Artificial Intelligence system, Data AI system), for use with a data integration or other computing environment, that leverages machine learning (ML, DataFlow Machine Learning, DFML), for use in managing a flow of data (dataflow, DF), and building complex dataflow software applications (dataflow applications, pipelines). In accordance with an embodiment, the system can provide data governance functionality such as, for example, provenance (where a particular data came from), lineage (how the data was acquired/processed), security (who was responsible for the data), classification (what is the data about), impact (how impactful is the data to a business), retention (how long should the data live), and validity (whether the data should be excluded/included for analysis/processing), for each slice of data pertinent to a particular snapshot in time; which can then be used in making lifecycle decisions and dataflow recommendations.
Abstract:
In accordance with various embodiments, described herein is a system (Data Artificial Intelligence system, Data AI system), for use with a data integration or other computing environment, that leverages machine learning (ML, DataFlow Machine Learning, DFML), for use in managing a flow of data (dataflow, DF), and building complex dataflow software applications (dataflow applications, pipelines). In accordance with an embodiment, the system can provide data governance functionality such as, for example, provenance (where a particular data came from), lineage (how the data was acquired/processed), security (who was responsible for the data), classification (what is the data about), impact (how impactful is the data to a business), retention (how long should the data live), and validity (whether the data should be excluded/included for analysis/processing), for each slice of data pertinent to a particular snapshot in time; which can then be used in making lifecycle decisions and dataflow recommendations.
Abstract:
A data catalog system is disclosed that provides capabilities for uniquely identifying and retrieving data entities stored in diverse data sources managed by an organization. The data catalog system includes capabilities for generating a unique external identifier for a data entity (e.g., a data asset or a data object) by identifying a set of immutable configuration parameters associated with the data asset and identifying a set of data object attributes that uniquely identify data objects within the data asset. The generated unique external identifiers are stored as part of the metadata harvested by the data catalog system. The external identifiers are used to enforce a single representation of the data assets and the data objects in the data catalog system. The external object identifiers are used to perform data lookups and reconcile states of data entities during the metadata harvesting process.
Abstract:
The present disclosure relates to system and techniques for enabling provisioning of cloud services for a client in an isolated yet scalable manner. In some embodiments, various computing resources are implemented within a cell (a self-sufficient unit). A number of cells are generated for a service or a group of services and distributed across a number of computing devices. Various cells may be generated that each pertain to a different aspect, or particular functionality, of the service. In some embodiments, cells providing various functionality for the service are implemented and distributed across different computing devices.
Abstract:
The present disclosure relates to a system and techniques for resolving dangling references resulting from a dependency relationship between computing resource objects uncovered during a harvesting process. In embodiments, a harvester application adds computing resource objects associated with a client to a resource collection as those computing resource objects are identified. Dependencies are identified as each computing resource object is added to the resource collection, which are resolved only if the computing resource objects associated with those dependencies have already been added to the resource collection. If the computing resource objects associated with the dependencies have not already been added to the resource collection, then the dependency is added to an observer pool. Observer modules are configured to check each computing resource object as it is processed during the harvest process in order to match those computing resource objects to unresolved dependencies.
Abstract:
In accordance with various embodiments, described herein is a system (Data Artificial Intelligence system, Data AI system), for use with a data integration or other computing environment, that leverages machine learning (ML, DataFlow Machine Learning, DFML), for use in managing a flow of data (dataflow, DF), and building complex dataflow software applications (dataflow applications, pipelines). In accordance with an embodiment, the system can include a software development component and graphical user interface, referred to herein in some embodiments as a pipeline editor, or Lambda Studio IDE, that provides a visual environment for use with the system, including providing real-time recommendations for performing semantic actions on data accessed from an input HUB, based on an understanding of the meaning or semantics associated with the data.
Abstract:
In accordance with various embodiments, described herein is a system (Data Artificial Intelligence system, Data AI system), for use with a data integration or other computing environment, that leverages machine learning (ML, DataFlow Machine Learning, DFML), for use in managing a flow of data (dataflow, DF), and building complex dataflow software applications (dataflow applications, pipelines). In accordance with an embodiment, the system can provide a service to recommend actions and transformations, on an input data, based on patterns identified from the functional decomposition of a data flow for a software application, including determining possible transformations of the data flow in subsequent applications. Data flows can be decomposed into a model describing transformations of data, predicates, and business rules applied to the data, and attributes used in the data flows.
Abstract:
In accordance with various embodiments, described herein is a system (Data Artificial Intelligence system, Data AI system), for use with a data integration or other computing environment, that leverages machine learning (ML, DataFlow Machine Learning, DFML), for use in managing a flow of data (dataflow, DF), and building complex dataflow software applications (dataflow applications, pipelines). In accordance with an embodiment, the system can provide data governance functionality such as, for example, provenance (where a particular data came from), lineage (how the data was acquired/processed), security (who was responsible for the data), classification (what is the data about), impact (how impactful is the data to a business), retention (how long should the data live), and validity (whether the data should be excluded/included for analysis/processing), for each slice of data pertinent to a particular snapshot in time; which can then be used in making lifecycle decisions and dataflow recommendations.