ANOMALOUS DATA IDENTIFICATION FOR TABULAR DATA

    公开(公告)号:US20240320538A1

    公开(公告)日:2024-09-26

    申请号:US18123673

    申请日:2023-03-20

    Applicant: ADOBE INC.

    CPC classification number: G06N20/00

    Abstract: Systems and methods identify anomalous data in tabular data. A set of tabular data records is received. Each tabular data record includes data elements for a numbers of attributes, with each data element providing a value for a corresponding attribute. An anomaly score is generated for each data element of each tabular data record. Additionally, an evidence set is defined for each attribute and each tabular data record based on the anomaly scores for the data elements. An anomaly score is generated for each attribute and each tabular data record using the evidence sets. An output is provided that identifies one or more anomalous data subsets determined based on the anomaly scores for the attributes and tabular data records. Each anomalous data subset identifies a subset of attributes and a subset of tabular data records.

    RELATING DATA IN DATA LAKES
    2.
    发明申请

    公开(公告)号:US20240386002A1

    公开(公告)日:2024-11-21

    申请号:US18319748

    申请日:2023-05-18

    Applicant: Adobe Inc.

    Abstract: A dataset comprising tables is received. Embeddings are generated for column titles of a table. Based on the embeddings, similar tables are clustered. The tables are organized into smaller clusters based on statistical similarities. Similarity scores are calculated for tables within the same cluster. A relatedness graph is created based on the similarity scores; similar tables are represented by nodes connected by edges. If the similarity score for a pair of tables exceeds a threshold, a table is deleted.

Patent Agency Ranking