摘要:
Data processing apparatus comprising: a chunk store containing specimen data chunks, a manifest store containing at least one manifest that represents at least a part of a data set and that comprises at least one reference to at least one of said specimen data chunks, a sparse chunk index containing information on only those specimen data chunks having a predetermined characteristic, the processing apparatus being operable to process input data into input data chunks and to use the sparse chunk index to identify at least one of said at least one manifest that includes at least one reference to one of said specimen data chunks that corresponds to one of said input data chunks having the predetermined characteristic.
摘要:
A computer system that includes a graphical user interface used to organize a group of documents is provided. The system includes a processor that is adapted to execute machine-readable instructions. The system also includes a storage device that is adapted to store data. The data includes a plurality of documents and instructions that are executable by the processor to generate the graphical user interface. The graphical user interface includes a cluster map that includes the results of a clustering algorithm applied to the documents. The graphical user interface also includes a principal documents screen that includes a principal document that is identified by weighting each of the documents in a cluster based, at least in part, on an occurrence of representative terms in the document. The representative terms are terms that have been identified by the clustering algorithm as being more effective for distinguishing between documents that belong to different clusters.
摘要:
Data objects are selectively stored across a plurality of differential data stores, where selection of the differential data stores for storing respective data objects is according to a criterion relating to compression of the data objects in each of the data stores, and where the differential data stores are stored in persistent storage media. Plural requests for accessing the differential data stores are batched, and one of the differential data stores is selected to page into temporary storage from the persistent storage media. The batched plural requests for accessing the selected differential data store that has been paged into the temporary storage are executed.
摘要:
To identify similar files in an environment having multiple client computers, a first client computer receives, from a coordinator computer, a request to find files located at the first client computer that are similar to at least one comparison file, wherein the request has also been sent to other client computers by the coordinator computer to request that the other client computers also find files that are similar to the at least one comparison file. In response to the request, the first client computer compares signatures of the files located at the first client computer with a signature of the at least one comparison file to identify at least a subset of the files located at the first client computer that are similar to the at least one comparison file according to a comparison metric. The first client computer sends, to the coordinator computer, a response relating to the comparing.
摘要:
The present invention provides a system for and a method of data cache management. In accordance with an embodiment, of the present invention, a method of cache management is provided. A request for access to data is received. A sample value is assigned to the request, the sample value being randomly selected according to a probability distribution. The sample value is compared to another value. The data is selectively stored in the cache based on results of the comparison.
摘要:
To manage storing of data in a data structure, a particular data value is represented as a group of segments stored in corresponding entries of the data structure. Additional data values represented by corresponding groups of segments are written into the data structure. A probability of overwriting segments representing the particular data value increases as a number of the additional data values increase. A correct version of the particular data value is retrieved even though one or more segments representing the particular data value has been overwritten.
摘要:
An integrity checking system includes a tag programming device that generates a plurality of identifiers. Each identifier is associated with either a storage item or an item to be stored by the storage item. The programming device stores each of the identifiers in a plurality of readable tags, each readable tag being adapted to be attached to a corresponding item. A tag reading device reads the identifiers stored in the readable tags and, using only information from the read tags, provides information indicating whether any item supposed to be stored on the storage item is missing from the storage item. Also, methods for storing and reading the identifiers are disclosed along with storing additional information about the items in the tags, such as physical information like weight and/or volume of the items, and then using this information to determine whether any items have been altered.
摘要:
A method and apparatus for generating an error correction code used in communicating over a channel, includes generating a set of candidate circulant blocks corresponding to a parity check matrix and a Hamming code wherein the Hamming code is initially unable to detect a predetermined error pattern without ambiguity due to one or more redundancies and eliminating columns of the parity check matrix and related redundancies in the detection of a predetermined error pattern as used by the resulting Hamming code.