Large scale data join service within a service provider network

    公开(公告)号:US10467191B1

    公开(公告)日:2019-11-05

    申请号:US15391714

    申请日:2016-12-27

    Abstract: Technologies are disclosed for providing a large scale data join service within a service provider network. A data set includes first and second sets of files that correspond to each other. Each file includes a first identifier (ID) and a second ID. The first set of files is partitioned based at least in part upon the first ID into a plurality of first subsets of files and the second set of files is partitioned based at least in part upon the first ID into a plurality of second subsets of files. Files within a first group of the plurality of first subsets and files within a second group of the plurality of second subsets are encoded into first and second bitsets, respectively, based at least in part upon the second IDs. An exclusive-or operation is performed on the first and second bitsets to find discrepancies between the data files.

Patent Agency Ranking