Method and apparatus for screening TB-scale incremental data

    公开(公告)号:US11789639B1

    公开(公告)日:2023-10-17

    申请号:US18193623

    申请日:2023-03-30

    Applicant: ZHEJIANG LAB

    CPC classification number: G06F3/0652 G06F3/0608 G06F3/0673 G06F16/215

    Abstract: A method and an apparatus for screening TB-scale of incremental data. In the present application, according to the memory capacity of the device, the raw data is divided into a plurality of raw data blocks, and the data is cleaned. By adopting a single-block index sorting algorithm, the de-duplicating ordering in the data blocks is completed without dropping operation, and the processed data blocks and a matrix hash index table are respectively generated and saved as initial data after completion. For the subsequent incremental data, the inter-block index-sorting algorithm is adopted, and the processed data blocks and the matrix hash index table are loaded in turn. The data is preliminarily screened on the basis of the matrix hash index table, and an incremental binary search algorithm is used for fine screening. Finally, the indexing and de-duplication screening of all data are completed.

Patent Agency Ranking