Abstract:
Transmission delays are minimized when packets are transmitted from a source computer over a network to a destination computer. The source computer measures the network's available bandwidth, forms a sequence of output packets from a sequence of data packets, and transmits the output packets over the network to the destination computer, where the transmission rate is ramped up to the measured bandwidth. In conjunction with the transmission, the source computer monitors a transmission delay indicator which it computes using acknowledgement packets it receives from the destination computer. Whenever the indicator specifies that the transmission delay is increasing, the source computer reduces the transmission rate until the indicator specifies that the delay is unchanged. The source computer dynamically decides whether each output packet will be a forward error correction packet or a single data packet, where the decision is based on minimizing the expected transmission delays.
Abstract:
The subject disclosure is directed towards a multi-tiered cache having cache tiers with different access properties. Objects are written to a selected a tier of the cache based upon object-related properties and/or cache-related properties. In one aspect, objects are stored in an active log among a plurality of logs. The active log is sealed upon reaching a target size, with a new active log opened. Garbage collecting is performed on a sealed log, such as the sealed log with the most garbage therein.
Abstract:
Described is using flash memory (or other secondary storage), RAM-based data structures and mechanisms to access key-value pairs stored in the flash memory using only a low RAM space footprint. A mapping (e.g. hash) function maps key-value pairs to a slot in a RAM-based index. The slot includes a pointer that points to a bucket of records on flash memory that each had keys that mapped to the slot. The bucket of records is arranged as a linear-chained linked list, e.g., with pointers from the most-recently written record to the earliest written record. Also described are compacting non-contiguous records of a bucket onto a single flash page, and garbage collection. Still further described is load balancing to reduce variation in bucket sizes, using a bloom filter per slot to avoid unnecessary searching, and splitting a slot into sub-slots.
Abstract:
An ISP-friendly rate allocation system and method that reduces network traffic across ISP boundaries in a peer-to-peer (P2P) network, Embodiments of the system and method continuously solve a global optimization problem and dictate accordingly how much bandwidth is allocated on each connection. Embodiments of the system and method minimize load on a server in communication with the P2P network, minimize ISP-unfriendly traffic while keeping the minimum server load unaffected, and maximize peer prefetching. Two different techniques are used to compute rate allocation, including a utility function optimization technique and a minimum cost flow formulation technique. The utility function optimization technique constructs a utility function and optimizes that utility function. The minimum cost flow formulation technique generates a minimum cost flow formulation using a bipartite graph have a vertices set and an edges set. A distributed minimum cost flow formulation is solved using Lagrangian multipliers.
Abstract:
The subject disclosure is directed towards a data deduplication technology in which a hash index service's index maintains a hash index in a secondary storage device such as a hard drive, along with a compact index table and look-ahead cache in RAM that operate to reduce the I/O to access the secondary storage device during deduplication operations. Also described is a session cache for maintaining data during a deduplication session, and encoding of a read-only compact index table for efficiency.
Abstract:
Described is using flash memory, RAM-based data structures and mechanisms to provide a flash store for caching data items (e.g., key-value pairs) in flash pages. A RAM-based index maps data items to flash pages, and a RAM-based write buffer maintains data items to be written to the flash store, e.g., when a full page can be written. A recycle mechanism makes used pages in the flash store available by destaging a data item to a hard disk or reinserting it into the write buffer, based on its access pattern. The flash store may be used in a data deduplication system, in which the data items comprise chunk-identifier, metadata pairs, in which each chunk-identifier corresponds to a hash of a chunk of data that indicates. The RAM and flash are accessed with the chunk-identifier (e.g., as a key) to determine whether a chunk is a new chunk or a duplicate.
Abstract:
Peer-to-peer communications sessions involve the transmission of one or more data streams from a source to a set of receivers that may redistribute portions of the data stream via a set of routing trees. Achieving a comparatively high, sustainable data rate throughput of the data stream(s) may be difficult due to the large number of available routing trees, as well as pertinent variations in the nature of the communications session (e.g., upload communications caps, network link caps, the presence or absence of helpers, and the full or partial interconnectedness of the network.) The selection of routing trees may be facilitated through the representation of the node set according to a linear programming model, such as a primal model or a linear programming dual model, and iterative processes for applying such models and identifying low-cost routing trees during an iteration.
Abstract:
Peer-to-peer communications sessions involve the transmission of one or more data streams from a source to a set of receivers that may redistribute portions of the data stream via a set of routing trees. Achieving a comparatively high, sustainable data rate throughput of the data stream(s) may be difficult due to the large number of available routing trees, as well as pertinent variations in the nature of the communications session (e.g., upload communications caps, network link caps, the presence or absence of helpers, and the full or partial interconnectedness of the network.) The selection of routing trees may be facilitated through the representation of the node set according to a linear programming model, such as a primal model or a linear programming dual model, and iterative processes for applying such models and identifying low-cost routing trees during an iteration.
Abstract:
The subject disclosure is directed towards a data deduplication technology in which a hash index service's index is partitioned into subspace indexes, with less than the entire hash index service's index cached to save memory. The subspace index is accessed to determine whether a data chunk already exists or needs to be indexed and stored. The index may be divided into subspaces based on criteria associated with the data to index, such as file type, data type, time of last usage, and so on. Also described is subspace reconciliation, in which duplicate entries in subspaces are detected so as to remove entries and chunks from the deduplication system. Subspace reconciliation may be performed at off-peak time, when more system resources are available, and may be interrupted if resources are needed. Subspaces to reconcile may be based on similarity, including via similarity of signatures that each compactly represents the subspace's hashes.
Abstract:
Described is using flash memory, RAM-based data structures and mechanisms to provide a flash store for caching data items (e.g., key-value pairs) in flash pages. A RAM-based index maps data items to flash pages, and a RAM-based write buffer maintains data items to be written to the flash store, e.g., when a full page can be written. A recycle mechanism makes used pages in the flash store available by destaging a data item to a hard disk or reinserting it into the write buffer, based on its access pattern. The flash store may be used in a data deduplication system, in which the data items comprise chunk-identifier, metadata pairs, in which each chunk-identifier corresponds to a hash of a chunk of data that indicates. The RAM and flash are accessed with the chunk-identifier (e.g., as a key) to determine whether a chunk is a new chunk or a duplicate.