摘要:
A data chunking system divides data into predominantly fixed-sized chunks such that duplicate data may be identified. The data chunking system may be used to reduce the data storage and save network bandwidth by allowing storage or transmission of primarily unique data chunks. The system may also be used to increase reliability in data storage and network transmission, by allowing an error affecting a data chunk to be repaired with an identified duplicate chunk. The data chunking system chunks data by selecting a chunk of fixed size, then moving a window along the data until a match to existing data is found. As the window moves across the data, unique chunks predominantly of fixed size are formed in the data passed over. Several embodiments provide alternate methods of determining whether a selected chunk matches existing data and methods by which the window is moved through the data. To locate duplicate data, the data chunking system remembers data by computing a mathematical function of a data chunk and inserting the computed value into a hash table.
摘要:
A cost-adaptive cache including the ability to dynamically maximize performance in a caching system by preferentially caching data according to the cost of replacing data. The cost adaptive cache includes a partitioned real cache, wherein data is stored in each of the real cache partitions according to its replacement cost. Also, the cost-adaptive cache includes a partitioned phantom cache to provide a directory of information pertaining to blocks of data which do not qualify for inclusion in the real cache. The partitions in the phantom cache correspond to the partitions in the real cache. Moreover, the cost-adaptive cache maximizes performance in a system by preferentially caching data that is more costly to replace. In one embodiment of the system, the cost of replacing a block of data is estimated by the previous cost incurred to fetch that block of data.
摘要:
A system and method of optimizing storage of common data blocks within a networked storage system comprises receiving a data block to be stored in the networked storage system, analyzing contents of the received data block to determine how many copies of the data block the system is entrusted to store and how many copies of the data block existing within the system, and to identify a location of each copy of the data block within the system, identifying performance and reliability requirements of the system, determining an optimal number of copies of the received data block to store in the system, wherein the determination is made according to a number of copies of the data block the system is entrusted to store together with the identified performance and reliability requirements of the system, and maintaining the optimal number of copies of the received data block within the system.
摘要:
A data storage system and method for reorganizing data to improve the effectiveness of data prefetching and reduce the data seek distance. A data reorganization region is allocated in which data is reorganized to service future requests for data. Sequences of data units that have been repeatedly requested are determined from a request stream, preferably using a graph where each vertex of the graph represents a requested data unit and each edge represents that a destination unit is requested shortly after a source unit the frequency of this occurrence. The most frequently requested data units are also determined from the request stream. The determined data is copied into the reorganization region and reorganized according to the determined sequences and most frequently requested units. The reorganized data might then be used to service future requests for data.
摘要:
The present invention is system and method for determining information that is to be prefetched in a multi-stream environment which can detect sequential streams from among the aggregate reference stream and yet requires relatively little memory to operate, which is uniquely adapted for use in a multi-stream environment, in which multiple data accessing streams are performing sequential accesses to information independently of each other. A reference address referencing stored information is received. A matching run is found. A count corresponding to the run is updated. If the count exceeds a predetermined threshold, an amount of information to prefetch is determined. If a predetermined fraction of the determined amount of information to prefetch must still be retrieved, the determined amount of information is retrieved. A matching run may be found by searching a stack comprising a plurality of entries to find an entry corresponding to the reference address. Each of the plurality of entries may be associated with a maximum accessed address, a forward range, and a backward range, and the searching step may comprise searching the plurality of stack entries in one direction starting at an end of the stack and determining whether the reference address is between (maximum accessed address−backward range) and (maximum accessed address+forward range) for each stack entry until a matching stack entry is found.
摘要:
A method to manage data located on networked devices is provided. The method includes replicating objects residing on the devices and collecting information about at least one of the objects or the devices. The method further includes receiving input on desired information governance policies and outcomes and analyzing the replicated objects, collected information and received input to determine an information governance action.
摘要:
A trustworthy inverted index system processes records to identify features for indexing, generates posting lists corresponding to features in a dictionary, maintains in a storage cache a tail of at least one of the posting lists to minimize random I/Os to the index, determines a desired number of the posting lists based on a desired level of insertion performance, a query performance, or a size of the storage cache, and reads a posting list corresponding to a search feature in a query to identify records that comprise the search feature. The system maps the features in the dictionary to the desired number of posting lists. The system uses a jump pointer to point from one entry to the next in the posting lists based on increasing values of entries in the posting lists.
摘要:
A method for reducing data loss and unavailability by integrating multiple levels of a storage hierarchy is provided. The method includes receiving a read request. In addition, the method includes recognizing a data failure in response to the read request. The method further includes locating an alternate source of the data to be read in response to recognizing the data failure. The alternate source includes data cached at devices in the storage hierarchy, data in a backup system, and cumulative changes to the data since the last backup. Moreover, the method includes responding to the read request with data from the alternate source.
摘要:
A method to manage objects in an information lifecycle management system is provided. The method includes determining a score for each of the objects based on a score of at least one feature within respective ones of each of the objects where the score of the at least one feature being associated with a valuation of the at least one feature. The method also includes managing each of the objects based on the score for each of the objects wherein higher scored objects are managed preferentially.
摘要:
The present invention discloses a method, apparatus, and article of manufacture for implementing a locking structure for supporting parity protection in a RAID clustered environment. When updating parity, the parity is locked so that other nodes cannot access or modify the parity. Accordingly, the parity is locked, read, generated, written, and unlocked by a node. An enhanced protocal may combine the lock and read functions and the write and unlock functions. Further, the SCSI RESERVE and RELEASE commands may be utlized to lock/unlock the parity data. By locking the parity in this maner, overhead is mininized and does not increase as the number of nodes increases.