Abstract:
The subject disclosure is directed towards a data deduplication technology in which a hash index service's index and/or indexing operations are adaptable to balance deduplication performance savings, throughput and resource consumption. The indexing service may employ hierarchical chunking using different levels of granularity corresponding to chunk size, a sampled compact index table that contains compact signatures for less than all of the hash index's (or subspace's) hash values, and/or selective subspace indexing based on similarity of a subspace's data to another subspace's data and/or to incoming data chunks.
Abstract:
The described implementations relate to distributed network management and more particularly to enhancing distributed network utility. One technique selects multiple trees to distribute content to multiple receivers in a session where individual receivers can receive the distributed content at one of a plurality of rates. The technique further adjustably allocates content distribution across the multiple trees to increase a sum of utilities of the multiple receivers.
Abstract:
The subject disclosure is directed towards a data deduplication technology in which a hash index service's index and/or indexing operations are adaptable to balance deduplication performance savings, throughput and resource consumption. The indexing service may employ hierarchical chunking using different levels of granularity corresponding to chunk size, a sampled compact index table that contains compact signatures for less than all of the hash index's (or subspace's) hash values, and/or selective subspace indexing based on similarity of a subspace's data to another subspace's data and/or to incoming data chunks.
Abstract:
Techniques are described for sharing content among peers. Locality domains are treated as first order network units. Content is located at the level of a locality domain using a hierarchical DHT in which nodes correspond to locality domains. A peer searches for a given piece of content in a proximity guided manner and terminates at the earliest locality domain (in the hierarchy) which has the content. Locality domains are organized into hierarchical clusters based on their proximity.
Abstract:
Difficulties associated with choosing advantageous network routes between server and clients are mitigated by a routing system that is devised to use many routing path sets, where respective sets comprise a number of routing paths covering all of the clients, including through other clients. A server may then apportion a data stream among all of the routing path sets. The server may also detect the performance of the computer network while sending the data stream between clients, and may adjust the apportionment of the routing path sets including the route. The clients may also be configured to operate as servers of other data streams, such as in a videoconferencing session, for example, and may be configured to send detected route performance information along with the portions of the various data streams.
Abstract:
The described implementations relate to distributed network management and more particularly to enhancing distributed network utility. One technique selects multiple trees to distribute content to multiple receivers in a session where individual receivers can receive the distributed content at one of a plurality of rates. The technique further adjustably allocates content distribution across the multiple trees to increase a sum of utilities of the multiple receivers.
Abstract:
Described is using flash memory, RAM-based data structures and mechanisms to provide a flash store for caching data items (e.g., key-value pairs) in flash pages. A RAM-based index maps data items to flash pages, and a RAM-based write buffer maintains data items to be written to the flash store, e.g., when a full page can be written. A recycle mechanism makes used pages in the flash store available by destaging a data item to a hard disk or reinserting it into the write buffer, based on its access pattern. The flash store may be used in a data deduplication system, in which the data items comprise chunk-identifier, metadata pairs, in which each chunk-identifier corresponds to a hash of a chunk of data that indicates. The RAM and flash are accessed with the chunk-identifier (e.g., as a key) to determine whether a chunk is a new chunk or a duplicate.
Abstract:
The subject disclosure is directed towards a data deduplication technology in which a hash index service's index maintains a hash index in a secondary storage device such as a hard drive, along with a compact index table and look-ahead cache in RAM that operate to reduce the I/O to access the secondary storage device during deduplication operations. Also described is a session cache for maintaining data during a deduplication session, and encoding of a read-only compact index table for efficiency.
Abstract:
The subject disclosure is directed towards a data deduplication technology in which a hash index service's index maintains a hash index in a secondary storage device such as a hard drive, along with a compact index table and look-ahead cache in RAM that operate to reduce the I/O to access the secondary storage device during deduplication operations. Also described is a session cache for maintaining data during a deduplication session, and encoding of a read-only compact index table for efficiency.
Abstract translation:主题公开涉及一种数据重复数据删除技术,其中散列索引服务的索引在诸如硬盘驱动器的辅助存储设备中维护散列索引,以及RAM中的紧凑索引表和预先高速缓存,其操作以减少 I / O在重复数据消除操作期间访问辅助存储设备。 还描述了用于在重复数据删除会话期间维护数据的会话高速缓存,以及用于效率的只读压缩索引表的编码。
Abstract:
The subject disclosure is directed towards partitioning a file into chunks that satisfy a chunk size restriction, such as maximum and minimum chunk sizes, using a sliding window. For file positions within the chunk size restriction, a signature representative of a window fingerprint is compared with a target pattern, with a chunk boundary candidate identified if matched. Other signatures and patterns are then checked to determine a highest ranking signature (corresponding to a lowest numbered Rule) to associate with that chunk boundary candidate, or set an actual boundary if the highest ranked signature is matched. If the maximum chunk size is reached without matching the highest ranked signature, the chunking mechanism regresses to set the boundary based on the candidate with the next highest ranked signature (if no candidates, the boundary is set at the maximum). Also described is setting chunk boundaries based upon pattern detection (e.g., runs of zeros).