Abstract:
Techniques for using a bloom filter in deduplication are described herein. A change log comprising a plurality of data blocks may be received. Values associated with the data blocks may be hashed and compared with a bloom filter. The comparison with the bloom filter identifies data blocks from the change log as unique data blocks or potential duplicate data blocks. A bit by bit comparison of the potential duplicate data blocks and previous data blocks may be performed to determine if any of the potential duplicate data blocks are identical to any of previous data blocks. Such data blocks of the change log that are identified as being identical may be deduplicated.
Abstract:
Embodiments of the systems and techniques described here can leverage several insights into the nature of workload access patterns and the working-set behavior to reduce the memory overheads. As a result, various embodiments make it feasible to maintain running estimates of a workload's cacheability in current storage systems with limited resources. For example, some embodiments provide for a method comprising estimating cacheability of a workload based on a first working-set size estimate generated from the workload over a first monitoring interval. Then, based on the cacheability of the workload, a workload cache size can be determined. A cache then can be dynamically allocated (e.g., change, possibly frequently, the cache allocation for the workload when the current allocation and the desired workload cache size differ), within a storage system for example, in accordance with the workload cache size.
Abstract:
A computer program product having a computer readable medium tangibly recording computer program logic for providing feedback in a network, the computer program product including code to receive first data and second data over the network at a receiving device, code to increment a first counter and a second counter in response to the first data and second data, respectively, code to generate a plurality of feedback signals reflecting states of the first and second counters using at least three bits, the bits defining a set of code points mapped to the states of the first and second counters so that each individual code point represents a different one of the states and each one of the states is represented by one code point, and code to transmit the plurality of feedback signals to a sending device in the network.
Abstract:
A method performed in a system that has a plurality of volumes stored to storage hardware, the method including generating, for each of the volumes, a respective space saving potential iteratively over time and scheduling space saving operations among the plurality of volumes by analyzing each of the volumes for space saving potential and assigning priority of resources based at least in part on space saving potential.
Abstract:
A storage system comprises a cache for caching data blocks and storage devices for storing blocks. A storage operating system may deduplicate sets of redundant blocks on the storage devices based on a deduplication requirement. Blocks in cache are typically deduplicated based on the deduplication on the storage devices. Sets of redundant blocks that have not met the deduplication requirement for storage devices and have not been deduplicated on the storage devices and cache are targeted for further deduplication processing. Sets of redundant blocks may be further deduplicated based on their popularity (number of accesses) in cache. If a set of redundant blocks in cache is determined to have a combined number of accesses being greater than a predetermined threshold number of accesses, the set of redundant blocks is determined to be “popular.” Popular sets of redundant blocks are selected for deduplication in cache and the storage devices.
Abstract:
Described herein are systems and methods for providing data policy management over application objects in a storage system environment. An application object may comprise non-virtual or virtual objects (e.g., non-virtual-based applications, virtual-based applications, or virtual storage components). An application object manager may represent application objects by producing mapping graphs and/or application object data that represent application objects in a standardized manner. A mapping graph for an application object may describe a mapping between the application object and its underlying storage objects on a storage system. Application object data may describe a mapping graph in a standardized format. Application object data representing application objects may be received by an application policy manager that manages data policies on the application objects (including virtual applications and virtual storage components) based on the received application object data. Data policies may include policies for backup, service level objectives, recovery, monitoring and/or reporting.
Abstract:
A namespace and storage management (NSM) application includes an infrastructure configured to enable efficient management of resources in a storage system environment. The NSM application executes on a NSM console and interacts with an NSM server to integrate namespace management and storage management in the storage system environment. The NSM server, in turn, interacts with one or more remote agents installed on host machines in the environment to convey application programming interface (API) function calls that enable remote management of the resources.
Abstract:
Example embodiments provide various techniques for distributing connections within a connectional parallelism architecture. In one embodiment, a method is provided where resource utilizations of connection groups are measured. Here, each connection group is assigned to one of multiple processors. A probability distribution is accessed that maps probabilities assigned to relative resource utilizations. A relative resource utilization of one of the connection groups is determined based on a resource utilization of the one connection group relative to other resource utilizations of other connection groups. A probability from the probability distribution is identified based on the determined relative resource utilization, and based on the identified probability, a connection is assigned to this connection group for execution by one of the processors assigned to this connection group.
Abstract:
Systems and methods for centralizing database manipulation for a plurality of heterogeneous databases are disclosed. A single or limited number of central servers can be used to manage a plurality of hosted client systems. With such a technique, database consistent backups can be performed without requiring altering of the central server, even when different database engines are used across the hosted client systems.
Abstract:
In one embodiment, a node coupled to one or more storage devices executes a storage input/output (I/O) stack having a volume layer. The volume layer manages volume metadata embodied as mappings from offsets of a logical unit (LUN) to extent keys associated with storage locations for extents on the one or more storage devices. Volume metadata is maintained as a dense tree metadata structure representing successive points in time. The dense tree metadata structure has multiple levels, wherein a top level of the dense tree metadata structure represents newer volume metadata changes and descending levels of the dense tree metadata structure represent older volume metadata changes. The node accesses a latest version of changes to the volume metadata by searching from the top level to the descending levels in the dense tree metadata structure.