METADATA-DRIVEN DATA INGESTION
    21.
    发明申请

    公开(公告)号:US20230100418A1

    公开(公告)日:2023-03-30

    申请号:US17665109

    申请日:2022-02-04

    Abstract: An electronic system for increasing the speed of preparing data with a specified data quality for storage by automatically identifying for a user, with minimal user input, common contexts among (i) fields in disparate datasets, and (ii) names the user has specified as potentially describing the fields, and by using those common contexts to govern the disparate datasets prior to storage to ensure the specified data quality.

    DATA GOVERNANCE SYSTEMS AND METHODS

    公开(公告)号:US20220398337A1

    公开(公告)日:2022-12-15

    申请号:US17834492

    申请日:2022-06-07

    Abstract: Some embodiments relate to a method for use in connection with governance of a plurality of data assets managed by a data processing system, the method comprising: using at least one computer hardware processor to perform: accessing a data governance policy comprising a first data standard (e.g., by obtaining information about the first standard stored in a database system); generating a first data asset collection at least in part by automatically selecting, from among the plurality of data assets managed by the data processing system and using at least one data asset criterion, one or more data assets that meet the at least one data asset criterion; associating the first data asset collection with the first data standard; and verifying whether at least one of the one or more data assets in the first data asset collection complies with the first data standard.

    SYSTEMS AND METHODS FOR DETERMINING RELATIONSHIPS AMONG DATA ELEMENTS

    公开(公告)号:US20220374413A1

    公开(公告)日:2022-11-24

    申请号:US17576572

    申请日:2022-01-14

    Abstract: A data processing system configured to perform: obtaining a first data lineage representing relationships among physical data elements, the first data lineage being generated at least in part by performing at least one of: (a) analyzing source code of at least one computer program configured to access the physical data elements; and (b) analyzing information obtained during runtime of the at least one computer program; obtaining, based on user input, a second data lineage representing relationships among business data elements; obtaining an association between at least some of the physical data elements of the first data lineage and at least some of the business data elements of the second data lineage; and generating, based on the association between the physical data elements and the business data elements, an indication of agreement or discrepancy between the first data lineage and the second data lineage.

    Static and runtime analysis of computer program ecosystems

    公开(公告)号:US11487534B2

    公开(公告)日:2022-11-01

    申请号:US17306075

    申请日:2021-05-03

    Abstract: A method for analyzing a computer program ecosystem includes performing a static analysis, including identifying static dependencies among elements of the ecosystem based on values of parameters in one or more parameter sets associated with the ecosystem, the elements of the ecosystem including the computer programs of the ecosystem and data resources associated with the computer programs. The method includes performing a runtime analysis, including identifying elements of the ecosystem that were utilized during execution of the ecosystem to process data records. The method includes performing a schedule analysis, including identifying a computer program of the ecosystem that has a schedule dependency from another computer program of the ecosystem. The method includes identifying a subset of the elements of the ecosystem as an ecosystem unit based on the results of the static, runtime, and schedule analyses. The method includes migrating the ecosystem unit, testing the ecosystem unit, or both.

    Differencing of executable dataflow graphs

    公开(公告)号:US11455229B2

    公开(公告)日:2022-09-27

    申请号:US17067020

    申请日:2020-10-09

    Abstract: A method for displaying differences between a first executable dataflow graph and a second executable dataflow graph includes comparing a specification of the first executable dataflow graph and a specification of the second executable dataflow graph, including at least one of identifying a particular node or link of the first dataflow graph that does not correspond to any node or link of the second dataflow graph; and identifying a first node or link of the first dataflow graph that corresponds to a second node or link of the second dataflow graph, and identifying a difference between the first node or link and the second node or link. The method includes formulating and displaying a graphical representation of at least some of the nodes or links of the first dataflow graph or the second dataflow graph, the graphical representation including a graphical indicator of at least one of the identified particular node or link the identified difference between the first node or link and the second node or link.

    Transforming a specification into a persistent computer program

    公开(公告)号:US11423083B2

    公开(公告)日:2022-08-23

    申请号:US15795917

    申请日:2017-10-27

    Abstract: A method performed by a computer system including: accessing a specification that specifies a plurality of modules to be implemented by the computer program for processing the one or more values of the one or more fields in the structured data item; transforming the specification into the computer program that implements the plurality of modules, wherein the transforming includes: for each of one or more first modules of the plurality of modules: identifying one or more second modules of the plurality of modules that each receive input that is at least partly based on an output of the first module; and formatting an output data format of the first module such that the first module outputs only one or more values of one or more fields of the structured data item.

    TECHNIQUES FOR MANAGING DATA IN A DATA PROCESSING SYSTEM USING DATA ENTITIES AND INHERITANCE

    公开(公告)号:US20220245154A1

    公开(公告)日:2022-08-04

    申请号:US17587130

    申请日:2022-01-28

    Abstract: Techniques for storing data entities by a data processing system are described herein. The data processing system may store a plurality of data entity instances generated using a plurality of data entities. The plurality of data entity instances may include a first data entity instance generated using a first data entity and a second data entity instance generated using a second data entity. The first data entity instance may include a first attribute that is configured to inherit its value from a second attribute of the second data entity instance. The data processing system may provide the inherited value of the second attribute of the second data entity instance as the value of the first attribute of the first data entity instance.

    Managing a computing cluster using replicated task results

    公开(公告)号:US11281693B2

    公开(公告)日:2022-03-22

    申请号:US16175487

    申请日:2018-10-30

    Abstract: A method for processing tasks in a distributed data processing system includes processing sets of tasks. The method includes maintaining, at a first processing node a number of counters including a working counter indicating a current time interval of the number of time intervals in the distributed data processing system, and a replication counter indicating a time interval of the number of time intervals for which at least one of (1) all tasks associated with that time interval, or (2) all corresponding results associated with that time interval, are replicated at multiple processing nodes of the number of processing nodes. The method includes providing messages from the first processing node to the other processing nodes of the number of processing nodes, the messages including the working counter and the replication counter.

    STATIC AND RUNTIME ANALYSIS OF COMPUTER PROGRAM ECOSYSTEMS

    公开(公告)号:US20210263734A1

    公开(公告)日:2021-08-26

    申请号:US17306075

    申请日:2021-05-03

    Abstract: A method for analyzing a computer program ecosystem including multiple computer programs includes performing a static analysis of the ecosystem, including identifying static dependencies among elements of the ecosystem based on values of parameters in one or more parameter sets associated with the ecosystem, the elements of the ecosystem including the computer programs of the ecosystem and data resources associated with the computer programs. The method includes performing a runtime analysis of the ecosystem, including identifying elements of the ecosystem that were utilized during execution of the ecosystem to process data records. The method includes performing a schedule analysis of the ecosystem, including identifying a computer program of the ecosystem that has a schedule dependency from another computer program of the ecosystem. The method includes identifying a subset of the elements of the ecosystem as an ecosystem unit based on the results of the static, runtime, and schedule analyses. The method includes migrating the ecosystem unit from a first computer system to a second computer system, testing the ecosystem unit, or both.

Patent Agency Ranking