RELATING DATA IN DATA LAKES
    1.
    发明申请

    公开(公告)号:US20240386002A1

    公开(公告)日:2024-11-21

    申请号:US18319748

    申请日:2023-05-18

    Applicant: Adobe Inc.

    Abstract: A dataset comprising tables is received. Embeddings are generated for column titles of a table. Based on the embeddings, similar tables are clustered. The tables are organized into smaller clusters based on statistical similarities. Similarity scores are calculated for tables within the same cluster. A relatedness graph is created based on the similarity scores; similar tables are represented by nodes connected by edges. If the similarity score for a pair of tables exceeds a threshold, a table is deleted.

Patent Agency Ranking