Prioritizing data clusters with customizable scoring strategies

    公开(公告)号:US08712906B1

    公开(公告)日:2014-04-29

    申请号:US13968213

    申请日:2013-08-15

    Abstract: Techniques are disclosed for prioritizing a plurality of clusters. Prioritizing clusters may generally include identifying a scoring strategy for prioritizing the plurality of clusters. Each cluster is generated from a seed and stores a collection of data retrieved using the seed. For each cluster, elements of the collection of data stored by the cluster are evaluated according to the scoring strategy and a score is assigned to the cluster based on the evaluation. The clusters may be ranked according to the respective scores assigned to the plurality of clusters. The collection of data stored by each cluster may include financial data evaluated by the scoring strategy for a risk of fraud. The score assigned to each cluster may correspond to an amount at risk.

    Generation and graphical display of data transform provenance metadata

    公开(公告)号:US11314769B2

    公开(公告)日:2022-04-26

    申请号:US16014005

    申请日:2018-06-21

    Abstract: Techniques for propagation of deletion operations among a plurality of related datasets are described herein. In an embodiment, a data processing method comprises, using a distributed database system that is programmed to manage a plurality of different raw datasets and a plurality of derived datasets that have been derived from the raw datasets based on a plurality of derivation relationships that link the raw datasets to the derived datasets: from a first dataset that is stored in the distributed database system, determining a subset of records that are candidates for propagated deletion of specified data values; determining one or more particular raw datasets that contain the subset of records; deleting the specified data values from the particular raw datasets; based on the plurality of derivation relationships and the particular raw datasets, identifying one or more particular derived datasets that have been derived from the particular raw datasets; generating and executing a build of the one or more particular derived datasets to result in creating and storing the one or more particular derived datasets without the specified data values that were deleted from the particular raw datasets; repeating the generating and executing for all derived datasets that have derivation relationships to the particular raw datasets; wherein the method is performed using one or more processors.

Patent Agency Ranking