-
公开(公告)号:US20250028686A1
公开(公告)日:2025-01-23
申请号:US18224981
申请日:2023-07-21
Applicant: Databricks, Inc.
Inventor: Pranav Anand , Praveen Gattu , Anish Shrigondekar , Huanli Wang
IPC: G06F16/174 , G06F16/14 , G06F16/16
Abstract: A device for using message identifiers for Publish/subscribe messaging deduplication is described. The system may fetch one or more sets of data records from a data source, and each data record is associated with a message identifier. The system may store the one or more sets of data records in a data file, which is associated with a metadata comprising the message identifier, a file path and a row number for each data record. The system may determine whether one or more of the data records are duplicated based on the associated message identifiers. In response to determining that the one or more data records are duplicated, the system may generate a second metadata comprising the file paths and row numbers associated with the duplicated data records.