Content aware chunking for achieving an improved chunk size distribution
    21.
    发明授权
    Content aware chunking for achieving an improved chunk size distribution 有权
    内容感知分块实现改进的块大小分布

    公开(公告)号:US08918375B2

    公开(公告)日:2014-12-23

    申请号:US13222198

    申请日:2011-08-31

    摘要: The subject disclosure is directed towards partitioning a file into chunks that satisfy a chunk size restriction, such as maximum and minimum chunk sizes, using a sliding window. For file positions within the chunk size restriction, a signature representative of a window fingerprint is compared with a target pattern, with a chunk boundary candidate identified if matched. Other signatures and patterns are then checked to determine a highest ranking signature (corresponding to a lowest numbered Rule) to associate with that chunk boundary candidate, or set an actual boundary if the highest ranked signature is matched. If the maximum chunk size is reached without matching the highest ranked signature, the chunking mechanism regresses to set the boundary based on the candidate with the next highest ranked signature (if no candidates, the boundary is set at the maximum). Also described is setting chunk boundaries based upon pattern detection (e.g., runs of zeros).

    摘要翻译: 本发明涉及使用滑动窗口将文件分成满足块大小限制的块,例如最大和最小块大小。 对于块大小限制内的文件位置,将窗口指纹的签名代表与目标模式进行比较,如果匹配则识别出块边界候选。 然后检查其他签名和模式以确定与该块块边界候选者相关联的最高排名签名(对应于最小编号的规则),或者如果最高排名签名匹配则设置实际边界。 如果没有匹配最高排名的签名达到最大块大小,则分块机制基于具有下一个最高排名的签名的候选者(如果没有候选,边界被设置为最大)而退化以设置边界。 还描述了基于模式检测(例如,零的运行)设置块边界。

    FLASH MEMORY CACHE INCLUDING FOR USE WITH PERSISTENT KEY-VALUE STORE
    22.
    发明申请
    FLASH MEMORY CACHE INCLUDING FOR USE WITH PERSISTENT KEY-VALUE STORE 有权
    闪存存储器缓存,包括使用唯一的键值存储

    公开(公告)号:US20130282965A1

    公开(公告)日:2013-10-24

    申请号:US13919738

    申请日:2013-06-17

    IPC分类号: G06F12/02 G11C7/10

    摘要: Described is using flash memory, RAM-based data structures and mechanisms to provide a flash store for caching data items (e.g., key-value pairs) in flash pages. A RAM-based index maps data items to flash pages, and a RAM-based write buffer maintains data items to be written to the flash store, e.g., when a full page can be written. A recycle mechanism makes used pages in the flash store available by destaging a data item to a hard disk or reinserting it into the write buffer, based on its access pattern. The flash store may be used in a data deduplication system, in which the data items comprise chunk-identifier, metadata pairs, in which each chunk-identifier corresponds to a hash of a chunk of data that indicates. The RAM and flash are accessed with the chunk-identifier (e.g., as a key) to determine whether a chunk is a new chunk or a duplicate.

    摘要翻译: 描述的是使用闪存,基于RAM的数据结构和机制来提供用于在闪存页中缓存数据项(例如键值对)的闪存。 基于RAM的索引将数据项映射到闪存页面,并且基于RAM的写入缓冲器保持要写入闪存存储器的数据项目,例如当可以写入全页时。 回收机制使得通过将数据项降级到硬盘或将其重新插入到写入缓冲器中,基于其访问模式,可用于闪存存储器中的使用页面。 闪存存储器可以用在数据重复数据删除系统中,其中数据项包括块标识符,元数据对,其中每个块标识符对应于指示的数据块的散列。 使用块标识符(例如,作为密钥)来访问RAM和闪存,以确定块是新的块还是重复的。

    Optimized transport protocol for delay-sensitive data
    23.
    发明授权
    Optimized transport protocol for delay-sensitive data 有权
    延迟敏感数据的优化传输协议

    公开(公告)号:US08228800B2

    公开(公告)日:2012-07-24

    申请号:US12364520

    申请日:2009-02-03

    IPC分类号: G06F11/00

    摘要: Transmission delays are minimized when packets are transmitted from a source computer over a network to a destination computer. The source computer measures the network's available bandwidth, forms a sequence of output packets from a sequence of data packets, and transmits the output packets over the network to the destination computer, where the transmission rate is ramped up to the measured bandwidth. In conjunction with the transmission, the source computer monitors a transmission delay indicator which it computes using acknowledgement packets it receives from the destination computer. Whenever the indicator specifies that the transmission delay is increasing, the source computer reduces the transmission rate until the indicator specifies that the delay is unchanged. The source computer dynamically decides whether each output packet will be a forward error correction packet or a single data packet, where the decision is based on minimizing the expected transmission delays.

    摘要翻译: 当数据包通过网络从源计算机传输到目标计算机时,传输延迟最小化。 源计算机测量网络的可用带宽,形成来自一系列数据分组的输出分组序列,并通过网络将输出分组发送到目标计算机,其中传输速率升高到测量带宽。 结合传输,源计算机监视传输延迟指示符,其使用从目的地计算机接收的确认分组来计算它。 每当指示符指示传输延迟增加时,源计算机降低传输速率,直到指示符指定延迟不变。 源计算机动态地确定每个输出分组是否将是前向纠错分组或单个数据分组,其中决定基于最小化期望的传输延迟。

    Using Index Partitioning and Reconciliation for Data Deduplication
    24.
    发明申请
    Using Index Partitioning and Reconciliation for Data Deduplication 有权
    使用索引分区和调整进行重复数据删除

    公开(公告)号:US20120166401A1

    公开(公告)日:2012-06-28

    申请号:US12979748

    申请日:2010-12-28

    IPC分类号: G06F17/30

    摘要: The subject disclosure is directed towards a data deduplication technology in which a hash index service's index is partitioned into subspace indexes, with less than the entire hash index service's index cached to save memory. The subspace index is accessed to determine whether a data chunk already exists or needs to be indexed and stored. The index may be divided into subspaces based on criteria associated with the data to index, such as file type, data type, time of last usage, and so on. Also described is subspace reconciliation, in which duplicate entries in subspaces are detected so as to remove entries and chunks from the deduplication system. Subspace reconciliation may be performed at off-peak time, when more system resources are available, and may be interrupted if resources are needed. Subspaces to reconcile may be based on similarity, including via similarity of signatures that each compactly represents the subspace's hashes.

    摘要翻译: 本发明涉及一种数据重复数据删除技术,其中散列索引服务的索引被分割成子空间索引,其中小于整个散列索引服务的索引来缓存存储器。 访问子空间索引以确定数据块是否已经存在或需要进行索引和存储。 索引可以根据与索引的数据相关联的标准被划分为子空间,例如文件类型,数据类型,最后使用时间等等。 还描述了子空间协调,其中检测子空间中的重复条目,以便从重复数据删除系统中删除条目和块。 当更多的系统资源可用时,子空间协调可以在非高峰时间执行,并且如果需要资源,则可能被中断。 调和的子空间可以基于相似性,包括通过每个紧密地表示子空间的散列的签名的相似性。

    OPTIMIZED TRANSPORT PROTOCOL FOR DELAY-SENSITIVE DATA
    25.
    发明申请
    OPTIMIZED TRANSPORT PROTOCOL FOR DELAY-SENSITIVE DATA 有权
    用于延迟敏感数据的优化运输协议

    公开(公告)号:US20100195488A1

    公开(公告)日:2010-08-05

    申请号:US12364520

    申请日:2009-02-03

    IPC分类号: H04J1/16 H04L12/56 H04L1/00

    摘要: Transmission delays are minimized when packets are transmitted from a source computer over a network to a destination computer. The source computer measures the network's available bandwidth, forms a sequence of output packets from a sequence of data packets, and transmits the output packets over the network to the destination computer, where the transmission rate is ramped up to the measured bandwidth. In conjunction with the transmission, the source computer monitors a transmission delay indicator which it computes using acknowledgement packets it receives from the destination computer. Whenever the indicator specifies that the transmission delay is increasing, the source computer reduces the transmission rate until the indicator specifies that the delay is unchanged. The source computer dynamically decides whether each output packet will be a forward error correction packet or a single data packet, where the decision is based on minimizing the expected transmission delays.

    摘要翻译: 当数据包通过网络从源计算机传输到目标计算机时,传输延迟最小化。 源计算机测量网络的可用带宽,形成来自一系列数据分组的输出分组序列,并通过网络将输出分组发送到目标计算机,其中传输速率升高到测量带宽。 结合传输,源计算机监视传输延迟指示符,其使用从目的地计算机接收的确认分组来计算它。 每当指示符指示传输延迟增加时,源计算机降低传输速率,直到指示符指定延迟不变。 源计算机动态地确定每个输出分组是否将是前向纠错分组或单个数据分组,其中决定基于最小化期望的传输延迟。

    Low RAM Space, High-Throughput Persistent Key-Value Store using Secondary Memory
    26.
    发明申请
    Low RAM Space, High-Throughput Persistent Key-Value Store using Secondary Memory 审中-公开
    低RAM空间,使用辅助内存的高吞吐量持久键值存储

    公开(公告)号:US20120102298A1

    公开(公告)日:2012-04-26

    申请号:US12908153

    申请日:2010-10-20

    IPC分类号: G06F12/10 G06F12/00

    摘要: Described is using flash memory (or other secondary storage), RAM-based data structures and mechanisms to access key-value pairs stored in the flash memory using only a low RAM space footprint. A mapping (e.g. hash) function maps key-value pairs to a slot in a RAM-based index. The slot includes a pointer that points to a bucket of records on flash memory that each had keys that mapped to the slot. The bucket of records is arranged as a linear-chained linked list, e.g., with pointers from the most-recently written record to the earliest written record. Also described are compacting non-contiguous records of a bucket onto a single flash page, and garbage collection. Still further described is load balancing to reduce variation in bucket sizes, using a bloom filter per slot to avoid unnecessary searching, and splitting a slot into sub-slots.

    摘要翻译: 描述的是使用闪速存储器(或其他辅助存储器),基于RAM的数据结构和机制来访问存储在闪存中的键值对,仅使用低的RAM空间。 映射(例如散列)功能将键值对映射到基于RAM的索引中的时隙。 插槽包括一个指向闪存中的一桶记录的指针,每个记录都具有映射到插槽的键。 记录的桶被布置为线性链接的链表,例如具有从最近写入的记录到最早的书面记录的指针。 还描述了将桶的不连续记录压缩到单个闪存页面上以及垃圾收集。 还进一步描述的是负载平衡以减少桶大小的变化,使用每时隙的布隆过滤器来避免不必要的搜索,并且将时隙分割成子时隙。

    FLASH MEMORY CACHE INCLUDING FOR USE WITH PERSISTENT KEY-VALUE STORE
    28.
    发明申请
    FLASH MEMORY CACHE INCLUDING FOR USE WITH PERSISTENT KEY-VALUE STORE 审中-公开
    闪存存储器缓存,包括使用唯一的键值存储

    公开(公告)号:US20110276744A1

    公开(公告)日:2011-11-10

    申请号:US12773859

    申请日:2010-05-05

    IPC分类号: G06F12/02 G06F12/10 G06F12/00

    摘要: Described is using flash memory, RAM-based data structures and mechanisms to provide a flash store for caching data items (e.g., key-value pairs) in flash pages. A RAM-based index maps data items to flash pages, and a RAM-based write buffer maintains data items to be written to the flash store, e.g., when a full page can be written. A recycle mechanism makes used pages in the flash store available by destaging a data item to a hard disk or reinserting it into the write buffer, based on its access pattern. The flash store may be used in a data deduplication system, in which the data items comprise chunk-identifier, metadata pairs, in which each chunk-identifier corresponds to a hash of a chunk of data that indicates. The RAM and flash are accessed with the chunk-identifier (e.g., as a key) to determine whether a chunk is a new chunk or a duplicate.

    摘要翻译: 描述的是使用闪存,基于RAM的数据结构和机制来提供用于在闪存页中缓存数据项(例如键值对)的闪存。 基于RAM的索引将数据项映射到闪存页面,并且基于RAM的写入缓冲器保持要写入闪存存储器的数据项目,例如当可以写入全页时。 回收机制使得通过将数据项降级到硬盘或将其重新插入到写入缓冲器中,基于其访问模式,可用于闪存存储器中的使用页面。 闪存存储器可以用在数据重复数据删除系统中,其中数据项包括块标识符,元数据对,其中每个块标识符对应于指示的数据块的散列。 使用块标识符(例如,作为密钥)来访问RAM和闪存,以确定块是新的块还是重复的。

    Models for routing tree selection in peer-to-peer communications
    29.
    发明授权
    Models for routing tree selection in peer-to-peer communications 有权
    在对等通信中路由树选择的模型

    公开(公告)号:US07738406B2

    公开(公告)日:2010-06-15

    申请号:US12247431

    申请日:2008-10-08

    IPC分类号: H04L12/28

    CPC分类号: H04L45/00 H04L45/48

    摘要: Peer-to-peer communications sessions involve the transmission of one or more data streams from a source to a set of receivers that may redistribute portions of the data stream via a set of routing trees. Achieving a comparatively high, sustainable data rate throughput of the data stream(s) may be difficult due to the large number of available routing trees, as well as pertinent variations in the nature of the communications session (e.g., upload communications caps, network link caps, the presence or absence of helpers, and the full or partial interconnectedness of the network.) The selection of routing trees may be facilitated through the representation of the node set according to a linear programming model, such as a primal model or a linear programming dual model, and iterative processes for applying such models and identifying low-cost routing trees during an iteration.

    摘要翻译: 对等通信会话涉及将一个或多个数据流从源传输到可以通过一组路由树重新分配数据流的部分的一组接收机。 由于大量的可用路由树以及通信会话性质的相关变化(例如,上传通信上限,网络链路),实现数据流的相对较高,可持续的数据速率吞吐量可能是困难的 帽子,帮助者的存在或不存在以及网络的全部或部分互连性)。可以通过根据线性规划模型(例如原始模型或线性的)的节点集合的表示来促进路由树的选择 编程双重模型,以及迭代过程,用于应用此类模型,并在迭代期间识别低成本路由树。

    MODELS FOR ROUTING TREE SELECTION IN PEER-TO-PEER COMMUNICATIONS
    30.
    发明申请
    MODELS FOR ROUTING TREE SELECTION IN PEER-TO-PEER COMMUNICATIONS 有权
    在对等通信中选择树的选择模式

    公开(公告)号:US20100085979A1

    公开(公告)日:2010-04-08

    申请号:US12247431

    申请日:2008-10-08

    IPC分类号: H04L12/56

    CPC分类号: H04L45/00 H04L45/48

    摘要: Peer-to-peer communications sessions involve the transmission of one or more data streams from a source to a set of receivers that may redistribute portions of the data stream via a set of routing trees. Achieving a comparatively high, sustainable data rate throughput of the data stream(s) may be difficult due to the large number of available routing trees, as well as pertinent variations in the nature of the communications session (e.g., upload communications caps, network link caps, the presence or absence of helpers, and the full or partial interconnectedness of the network.) The selection of routing trees may be facilitated through the representation of the node set according to a linear programming model, such as a primal model or a linear programming dual model, and iterative processes for applying such models and identifying low-cost routing trees during an iteration.

    摘要翻译: 对等通信会话涉及将一个或多个数据流从源传输到可以通过一组路由树重新分配数据流的部分的一组接收器。 由于大量的可用路由树以及通信会话性质的相关变化(例如,上传通信上限,网络链路),实现数据流的相对较高,可持续的数据速率吞吐量可能是困难的 帽子,帮助者的存在或不存在以及网络的全部或部分互连性)。可以通过根据线性规划模型(例如原始模型或线性的)的节点集合的表示来促进路由树的选择 编程双重模型,以及迭代过程,用于应用此类模型,并在迭代期间识别低成本路由树。