Abstract:
Systems, computer-readable media storing instructions, and methods can infer a state of behavior. Such a method can include constructing a graph including nodes representing hosts and domains based on an event dataset. The graph can be seeded with information external to the event dataset. A belief whether each of the nodes is in a particular state of behavior can be calculated based on marginal probability estimation.
Abstract:
A system and method for transparently compressing file system data using compression group descriptors is provided. When data contained within a compression group be compressed beyond a predefined threshold value, a compression group descriptor is included in the compression group that signifies that the data for the group of level 0 blocks is compressed into a lesser number of physical data blocks. When performing a read operation, the file system first determines the appropriate compression group that contains the desired data and determines whether the compression group has been compressed. If so, the file system decompresses the data in the compression group before returning the decompressed data. If the magic value is not the first pointer position, then the data within the compression group was previously stored in an uncompressed format, and the data may be returned without performing a decompression operation.
Abstract:
A method and system for eliminating the redundant allocation and deallocation of special data on disk, wherein the redundant allocation and deallocation of special data on disk is eliminated by providing an innovate technique for specially allocating special data of a storage system. Specially allocated data is data that is pre-allocated on disk and stored in memory of the storage system. “Special data” may include any pre-decided data, one or more portions of data that exceed a pre-defined sharing threshold, and/or one or more portions of data that have been identified by a user as special. For example, in some embodiments, a zero-filled data block is specially allocated by a storage system. As another example, in some embodiments, a data block whose contents correspond to a particular type document header is specially allocated.
Abstract:
The present invention provides a system and method for estimating duplicate data in a storage system. A duplicate estimation application executes on a client of a storage system selects an element from an intended destination such as, e.g., a data store of the storage system. If the element is a file (or other data container), the application reads data from the file and computes a fingerprint of the read data. The computed fingerprint is then logged in a fingerprint database, which is illustratively stored on a storage device connected to the client executing the application. This process repeats until the entire file (or other data container) has been read and fingerprinted. Once all elements have been scanned, fingerprinted and recorded, the application identifies any unique entries within the fingerprint database. Utilizing this information, the application computes an estimated space savings that may be realized by employing a data de-duplication technique.
Abstract:
A request is received to remove duplicate data. A log data container associated with a storage volume in a storage server is accessed. The log data container includes a plurality of entries. Each entry is identified by an extent identifier in a data structures stored in a volume associated with the storage server. For each entry in the log data container, a determination is made if the entry matches another entry in the log data container. If the entry matches another entry in the log data container, a determination is made of a donor extent and a recipient extent. If an external reference count associated with the recipient extent equals a first predetermined value, block sharing is performed for the donor extent and the recipient extent. A determination is made if the reference count of the donor extent equals a second predetermined value. If the reference count of the donor extent equals the second predetermined value, the donor extent is freed.
Abstract:
A method and system for estimating space in a compressed volume to enable a storage server to respond to write requests before actually compressing and/or allocating data on disk. In some embodiments, in response to receiving a request to store data, the storage server estimates the amount of storage space required to store the data on disk. The storage server compares the estimated amount with the amount of available disk space. When the amount of available disk space is less than the estimated space, the storage server sends a response indicating that the request failed. Otherwise, when the amount of available disk space is greater than or equal to the estimate space, the storage server sends a response indicating that the request succeeded. The response is sent before the storage server allocates any disk space in connection with the request.
Abstract:
An extent-based storage architecture is implemented by a storage server receiving a read request for an extent from a client, wherein the extent includes a group of contiguous blocks and the read request includes a file block number. The storage server retrieves an extent identifier from a first sorted data structure, wherein the storage server uses the received file block number to traverse the first sorted data structure to the extent identifier. The storage server retrieves a reference to the extent from a second sorted data structure, wherein the storage server uses the retrieved extent identifier to traverse the second sorted data structure to the reference, and wherein the second sorted data structure is global across a plurality of volumes. The storage server retrieves the extent from a storage device using the reference and returns the extent to the client.
Abstract:
A system and method for transparently compressing file system data using compression group descriptors is provided. When data contained within a compression group be compressed beyond a predefined threshold value, a compression group descriptor is included in the compression group that signifies that the data for the group of level 0 blocks is compressed into a lesser number of physical data blocks. When performing a read operation, the file system first determines the appropriate compression group that contains the desired data and determines whether the compression group has been compressed. If so, the file system decompresses the data in the compression group before returning the decompressed data. If the magic value is not the first pointer position, then the data within the compression group was previously stored in an uncompressed format, and the data may be returned without performing a decompression operation.
Abstract:
An extent-based storage architecture is implemented by a storage server receiving a read request for an extent from a client, wherein the extent includes a group of contiguous blocks and the read request includes a file block number. The storage server retrieves an extent identifier from a first sorted data structure, wherein the storage server uses the received file block number to traverse the first sorted data structure to the extent identifier. The storage server retrieves a reference to the extent from a second sorted data structure, wherein the storage server uses the retrieved extent identifier to traverse the second sorted data structure to the reference, and wherein the second sorted data structure is global across a plurality of volumes. The storage server retrieves the extent from a storage device using the reference and returns the extent to the client.
Abstract:
A system and method for transparently compressing file system data using compression group descriptors is provided. When data contained within a compression group be compressed beyond a predefined threshold value, a compression group descriptor is included in the compression group that signifies that the data for the group of level 0 blocks is compressed into a lesser number of physical data blocks. When performing a read operation, the file system first determines the appropriate compression group that contains the desired data and determines whether the compression group has been compressed. If so, the file system decompresses the data in the compression group before returning the decompressed data. If the magic value is not the first pointer position, then the data within the compression group was previously stored in an uncompressed format, and the data may be returned without performing a decompression operation.