Low RAM space, high-throughput persistent key-value store using secondary memory

    公开(公告)号:US10558705B2

    公开(公告)日:2020-02-11

    申请号:US12908153

    申请日:2010-10-20

    摘要: Described is using flash memory (or other secondary storage), RAM-based data structures and mechanisms to access key-value pairs stored in the flash memory using only a low RAM space footprint. A mapping (e.g. hash) function maps key-value pairs to a slot in a RAM-based index. The slot includes a pointer that points to a bucket of records on flash memory that each had keys that mapped to the slot. The bucket of records is arranged as a linear-chained linked list, e.g., with pointers from the most-recently written record to the earliest written record. Also described are compacting non-contiguous records of a bucket onto a single flash page, and garbage collection. Still further described is load balancing to reduce variation in bucket sizes, using a bloom filter per slot to avoid unnecessary searching, and splitting a slot into sub-slots.

    Structuring storage based on latch-free B-trees
    43.
    发明授权
    Structuring storage based on latch-free B-trees 有权
    基于无闩锁B树构建存储

    公开(公告)号:US09003162B2

    公开(公告)日:2015-04-07

    申请号:US13527880

    申请日:2012-06-20

    IPC分类号: G06F12/10

    摘要: A request to modify an object in storage that is associated with one or more computing devices may be obtained, the storage organized based on a latch-free B-tree structure. A storage address of the object may be determined, based on accessing a mapping table that includes map indicators mapping logical object identifiers to physical storage addresses. A prepending of a first delta record to a prior object state of the object may be initiated, the first delta record indicating an object modification associated with the obtained request. Installation of a first state change associated with the object modification may be initiated via a first atomic operation on a mapping table entry that indicates the prior object state of the object. For example, the latch-free B-tree structure may include a B-tree like index structure over records as the objects, and logical page identifiers as the logical object identifiers.

    摘要翻译: 可以获得修改与一个或多个计算设备相关联的存储中的对象的请求,该存储是基于无闩锁B树结构组织的。 可以基于访问包括将逻辑对象标识符映射到物理存储地址的映射指示符的映射表来确定对象的存储地址。 可以启动对对象的先前对象状态的第一增量记录的前缀,所述第一增量记录指示与所获取的请求相关联的对象修改。 可以通过指示对象的先前对象状态的映射表项上的第一原子操作来启动与对象修改相关联的第一状态改变的安装。 例如,无闩锁B树结构可以包括作为对象的记录上的B树类索引结构,以及逻辑页标识符作为逻辑对象标识符。

    Content aware chunking for achieving an improved chunk size distribution
    44.
    发明授权
    Content aware chunking for achieving an improved chunk size distribution 有权
    内容感知分块实现改进的块大小分布

    公开(公告)号:US08918375B2

    公开(公告)日:2014-12-23

    申请号:US13222198

    申请日:2011-08-31

    摘要: The subject disclosure is directed towards partitioning a file into chunks that satisfy a chunk size restriction, such as maximum and minimum chunk sizes, using a sliding window. For file positions within the chunk size restriction, a signature representative of a window fingerprint is compared with a target pattern, with a chunk boundary candidate identified if matched. Other signatures and patterns are then checked to determine a highest ranking signature (corresponding to a lowest numbered Rule) to associate with that chunk boundary candidate, or set an actual boundary if the highest ranked signature is matched. If the maximum chunk size is reached without matching the highest ranked signature, the chunking mechanism regresses to set the boundary based on the candidate with the next highest ranked signature (if no candidates, the boundary is set at the maximum). Also described is setting chunk boundaries based upon pattern detection (e.g., runs of zeros).

    摘要翻译: 本发明涉及使用滑动窗口将文件分成满足块大小限制的块,例如最大和最小块大小。 对于块大小限制内的文件位置,将窗口指纹的签名代表与目标模式进行比较,如果匹配则识别出块边界候选。 然后检查其他签名和模式以确定与该块块边界候选者相关联的最高排名签名(对应于最小编号的规则),或者如果最高排名签名匹配则设置实际边界。 如果没有匹配最高排名的签名达到最大块大小,则分块机制基于具有下一个最高排名的签名的候选者(如果没有候选,边界被设置为最大)而退化以设置边界。 还描述了基于模式检测(例如,零的运行)设置块边界。

    STRUCTURING STORAGE BASED ON LATCH-FREE B-TREES
    45.
    发明申请
    STRUCTURING STORAGE BASED ON LATCH-FREE B-TREES 有权
    基于无需B-TREES的结构存储

    公开(公告)号:US20130346725A1

    公开(公告)日:2013-12-26

    申请号:US13527880

    申请日:2012-06-20

    IPC分类号: G06F12/10

    摘要: A request to modify an object in storage that is associated with one or more computing devices may be obtained, the storage organized based on a latch-free B-tree structure. A storage address of the object may be determined, based on accessing a mapping table that includes map indicators mapping logical object identifiers to physical storage addresses. A prepending of a first delta record to a prior object state of the object may be initiated, the first delta record indicating an object modification associated with the obtained request. Installation of a first state change associated with the object modification may be initiated via a first atomic operation on a mapping table entry that indicates the prior object state of the object. For example, the latch-free B-tree structure may include a B-tree like index structure over records as the objects, and logical page identifiers as the logical object identifiers.

    摘要翻译: 可以获得修改与一个或多个计算设备相关联的存储中的对象的请求,该存储是基于无闩锁B树结构组织的。 可以基于访问包括将逻辑对象标识符映射到物理存储地址的映射指示符的映射表来确定对象的存储地址。 可以启动对对象的先前对象状态的第一增量记录的前缀,所述第一增量记录指示与所获取的请求相关联的对象修改。 可以通过指示对象的先前对象状态的映射表项上的第一原子操作来启动与对象修改相关联的第一状态改变的安装。 例如,无闩锁B树结构可以包括作为对象的记录上的B树类索引结构,以及逻辑页标识符作为逻辑对象标识符。

    FLASH MEMORY CACHE INCLUDING FOR USE WITH PERSISTENT KEY-VALUE STORE
    46.
    发明申请
    FLASH MEMORY CACHE INCLUDING FOR USE WITH PERSISTENT KEY-VALUE STORE 有权
    闪存存储器缓存,包括使用唯一的键值存储

    公开(公告)号:US20130282965A1

    公开(公告)日:2013-10-24

    申请号:US13919738

    申请日:2013-06-17

    IPC分类号: G06F12/02 G11C7/10

    摘要: Described is using flash memory, RAM-based data structures and mechanisms to provide a flash store for caching data items (e.g., key-value pairs) in flash pages. A RAM-based index maps data items to flash pages, and a RAM-based write buffer maintains data items to be written to the flash store, e.g., when a full page can be written. A recycle mechanism makes used pages in the flash store available by destaging a data item to a hard disk or reinserting it into the write buffer, based on its access pattern. The flash store may be used in a data deduplication system, in which the data items comprise chunk-identifier, metadata pairs, in which each chunk-identifier corresponds to a hash of a chunk of data that indicates. The RAM and flash are accessed with the chunk-identifier (e.g., as a key) to determine whether a chunk is a new chunk or a duplicate.

    摘要翻译: 描述的是使用闪存,基于RAM的数据结构和机制来提供用于在闪存页中缓存数据项(例如键值对)的闪存。 基于RAM的索引将数据项映射到闪存页面,并且基于RAM的写入缓冲器保持要写入闪存存储器的数据项目,例如当可以写入全页时。 回收机制使得通过将数据项降级到硬盘或将其重新插入到写入缓冲器中,基于其访问模式,可用于闪存存储器中的使用页面。 闪存存储器可以用在数据重复数据删除系统中,其中数据项包括块标识符,元数据对,其中每个块标识符对应于指示的数据块的散列。 使用块标识符(例如,作为密钥)来访问RAM和闪存,以确定块是新的块还是重复的。

    Optimized transport protocol for delay-sensitive data
    48.
    发明授权
    Optimized transport protocol for delay-sensitive data 有权
    延迟敏感数据的优化传输协议

    公开(公告)号:US08228800B2

    公开(公告)日:2012-07-24

    申请号:US12364520

    申请日:2009-02-03

    IPC分类号: G06F11/00

    摘要: Transmission delays are minimized when packets are transmitted from a source computer over a network to a destination computer. The source computer measures the network's available bandwidth, forms a sequence of output packets from a sequence of data packets, and transmits the output packets over the network to the destination computer, where the transmission rate is ramped up to the measured bandwidth. In conjunction with the transmission, the source computer monitors a transmission delay indicator which it computes using acknowledgement packets it receives from the destination computer. Whenever the indicator specifies that the transmission delay is increasing, the source computer reduces the transmission rate until the indicator specifies that the delay is unchanged. The source computer dynamically decides whether each output packet will be a forward error correction packet or a single data packet, where the decision is based on minimizing the expected transmission delays.

    摘要翻译: 当数据包通过网络从源计算机传输到目标计算机时,传输延迟最小化。 源计算机测量网络的可用带宽,形成来自一系列数据分组的输出分组序列,并通过网络将输出分组发送到目标计算机,其中传输速率升高到测量带宽。 结合传输,源计算机监视传输延迟指示符,其使用从目的地计算机接收的确认分组来计算它。 每当指示符指示传输延迟增加时,源计算机降低传输速率,直到指示符指定延迟不变。 源计算机动态地确定每个输出分组是否将是前向纠错分组或单个数据分组,其中决定基于最小化期望的传输延迟。

    Using Index Partitioning and Reconciliation for Data Deduplication
    49.
    发明申请
    Using Index Partitioning and Reconciliation for Data Deduplication 有权
    使用索引分区和调整进行重复数据删除

    公开(公告)号:US20120166401A1

    公开(公告)日:2012-06-28

    申请号:US12979748

    申请日:2010-12-28

    IPC分类号: G06F17/30

    摘要: The subject disclosure is directed towards a data deduplication technology in which a hash index service's index is partitioned into subspace indexes, with less than the entire hash index service's index cached to save memory. The subspace index is accessed to determine whether a data chunk already exists or needs to be indexed and stored. The index may be divided into subspaces based on criteria associated with the data to index, such as file type, data type, time of last usage, and so on. Also described is subspace reconciliation, in which duplicate entries in subspaces are detected so as to remove entries and chunks from the deduplication system. Subspace reconciliation may be performed at off-peak time, when more system resources are available, and may be interrupted if resources are needed. Subspaces to reconcile may be based on similarity, including via similarity of signatures that each compactly represents the subspace's hashes.

    摘要翻译: 本发明涉及一种数据重复数据删除技术,其中散列索引服务的索引被分割成子空间索引,其中小于整个散列索引服务的索引来缓存存储器。 访问子空间索引以确定数据块是否已经存在或需要进行索引和存储。 索引可以根据与索引的数据相关联的标准被划分为子空间,例如文件类型,数据类型,最后使用时间等等。 还描述了子空间协调,其中检测子空间中的重复条目,以便从重复数据删除系统中删除条目和块。 当更多的系统资源可用时,子空间协调可以在非高峰时间执行,并且如果需要资源,则可能被中断。 调和的子空间可以基于相似性,包括通过每个紧密地表示子空间的散列的签名的相似性。