Equi-joins between split tables
    11.
    发明授权

    公开(公告)号:US09846709B2

    公开(公告)日:2017-12-19

    申请号:US14823943

    申请日:2015-08-11

    申请人: SAP SE

    IPC分类号: G06F17/30

    摘要: A join operation between split data tables includes providing value IDs. For each of the value IDs, a unique global ID may be associated with the value ID when the actual value represented by the value ID occurs among actual values comprising the second attribute of the second partition. For each identified unique global ID, the identified unique global ID may be paired with a document ID of a data record contained in a second partition stored at the second server in which the actual value in the data record is represented by the value ID associated with the identified unique global ID.

    Variable sized partitioning for distributed hash tables

    公开(公告)号:US09836492B1

    公开(公告)日:2017-12-05

    申请号:US13666549

    申请日:2012-11-01

    IPC分类号: G06F17/30 G06F3/06

    摘要: A distributed hash table (“DHT”) is created with partitions that have different sizes. A hash function allocates data to the partitions in the DHT at approximately equal rates. When the data stored on a partition approaches the storage capacity of the partition, the partition is split by adding a new partition to the DHT that has a size that is different than the sizes of the other partitions in the DHT. A portion of the data stored on the split partition is then reallocated to the new partition. A portion of a keyspace previously assigned to the split partition is also allocated to the new partition. Once the keyspace is reallocated, the hash function can allocate data to the new partition in the DHT.

    Large distributed database clustering systems and methods

    公开(公告)号:US09805108B2

    公开(公告)日:2017-10-31

    申请号:US13929109

    申请日:2013-06-27

    申请人: MongoDB, Inc.

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30584 G06F17/30578

    摘要: Systems and methods are provided for managing asynchronous replication in a distributed database environment, while providing for scaling of the distributed database. A cluster of nodes can be assigned roles for managing partitions of data within the database and processing database requests. In one embodiment, each cluster includes a node with a primary role to process write operations and mange asynchronous replication of the operations to at least one secondary node. Each cluster or set of nodes can host one or more partitions of database data. Collectively, the cluster or set of nodes define a shard cluster that hosts all the data of the distributed database. Each shard cluster, individual nodes, or sets of nodes can be configured to manage the size of any hosted partitions, splitting database partitions, migrating partitions, and/or managing expansion of shard clusters to encompass new systems.