-
公开(公告)号:US10467191B1
公开(公告)日:2019-11-05
申请号:US15391714
申请日:2016-12-27
Applicant: Amazon Technologies, Inc.
Inventor: Wei Yu , Nengwu Zhu , Hyen Vui Chung , Qihui Lee
IPC: G06F16/00 , G06F16/13 , G06F16/2455
Abstract: Technologies are disclosed for providing a large scale data join service within a service provider network. A data set includes first and second sets of files that correspond to each other. Each file includes a first identifier (ID) and a second ID. The first set of files is partitioned based at least in part upon the first ID into a plurality of first subsets of files and the second set of files is partitioned based at least in part upon the first ID into a plurality of second subsets of files. Files within a first group of the plurality of first subsets and files within a second group of the plurality of second subsets are encoded into first and second bitsets, respectively, based at least in part upon the second IDs. An exclusive-or operation is performed on the first and second bitsets to find discrepancies between the data files.
-
公开(公告)号:US10511484B1
公开(公告)日:2019-12-17
申请号:US15469416
申请日:2017-03-24
Applicant: Amazon Technologies, Inc.
Inventor: Wei Yu , Dmytro Ivashchenko , Qihui Li , Nengwu Zhu , Bhavesh Anil Doshi , Joshua Stephen Ullom , Nathan Manning , Michael Christopher Wenneman , Yubai Di , Hyen Vui Chung
Abstract: In large distributed computing environments, application execution may be distributed between a plurality of groups, the plurality of groups containing a set of host computer systems responsible for the execution of one or more operations of the application. Group membership may be determined by generating configuration information based at least in part on the plurality of groups. The configuration information may be provided to a plurality of host computer systems and each host computer system of the plurality of host computer systems may determine membership to a particular group of the plurality of groups based at least in part on the configuration information.
-