Abstract:
Embodiments described herein provide an object store that efficiently manages and services objects for use by clients of a distributed data processing system. Illustratively, the object store may be embodied as a quasi-shared storage system that interacts with nodes of the distributed data processing system to service the objects as blocks of data stored on a plurality of storage devices, such as disks, of the storage system. To that end, an architecture of the object store may include an on-disk layout, e.g., of the storage system, and an incore layout, e.g., of the nodes, that cooperate to illustratively convert the blocks to objects for access by the clients.
Abstract:
In addition to caching I/O operations at a host, at least some data management can migrate to the host. With host side caching, data sharing or deduplication can be implemented with the cached writes before those writes are supplied to front end storage elements. When a host cache flush to distributed storage trigger is detected, the host deduplicates the cached writes. The host aggregates data based on the deduplication into a “change set file” (i.e., a file that includes the aggregation of unique data from the cached writes). The host supplies the change set file to the distributed storage system. The host then sends commands to the distributed storage system. Each of the commands identifies a part of the change set file to be used for a target of the cached writes.
Abstract:
Technology is disclosed for managing data in a distributed file system (“the technology”). The technology can gather metadata information associated with the data stored within a first file system, store the metadata information in association with a data identifier within a second file system, retrieve the stored metadata information using the data identifier from within the second file system and locate and retrieve the data associated with the metadata information from within first file system.
Abstract:
A system and method for planning and configuring the components of a modular computing system is provided. In some embodiments, the method for planning an implementation of a modular computing system comprises presenting a user interface at a display device, the user interface including a plurality of user-selectable objects, each of the user-selectable objects representing a component of the modular computing system. A user selection is received via a user input device. The user selection is from among the user-selectable objects and specifies one of an enclosure, an existing component, and a future component of the modular computing system. A representation of the specified one of an enclosure, an existing component, and a future component is displayed at a display device. The user selection is verified with respect to an implementation guideline. An indicator of whether the user selection meets the implementation guideline is displayed at the display device.
Abstract:
In addition to caching I/O operations at a host, at least some data management can migrate to the host. With host side caching, data sharing or deduplication can be implemented with the cached writes before those writes are supplied to front end storage elements. When a host cache flush to distributed storage trigger is detected, the host deduplicates the cached writes. The host aggregates data based on the deduplication into a “change set file” (i.e., a file that includes the aggregation of unique data from the cached writes). The host supplies the change set file to the distributed storage system. The host then sends commands to the distributed storage system. Each of the commands identifies a part of the change set file to be used for a target of the cached writes.
Abstract:
In addition to caching I/O operations at a host, at least some data management can migrate to the host. With host side caching, data sharing or deduplication can be implemented with the cached writes before those writes are supplied to front end storage elements. When a host cache flush to distributed storage trigger is detected, the host deduplicates the cached writes. The host aggregates data based on the deduplication into a “change set file” (i.e., a file that includes the aggregation of unique data from the cached writes). The host supplies the change set file to the distributed storage system. The host then sends commands to the distributed storage system. Each of the commands identifies a part of the change set file to be used for a target of the cached writes.
Abstract:
A network storage server system includes a distributed object store and a metadata subsystem. The metadata subsystem stores metadata relating to the stored data objects and allows data objects to be located and retrieved easily via user-specified search queries. It manages and allows searches on at least three categories of metadata via the same user interface and technique. These categories include user-specified metadata, inferred metadata and system-defined metadata. Search queries for the metadata can include multi-predicate queries.
Abstract:
Technology is disclosed for managing data in a distributed file system (“the technology”). The technology can gather metadata information associated with the data stored within a first file system, store the metadata information in association with a data identifier within a second file system, retrieve the stored metadata information using the data identifier from within the second file system and locate and retrieve the data associated with the metadata information from within first file system.
Abstract:
Technology is disclosed for managing data in a distributed processing system (“the technology”). In various embodiments, the technology pushes “cold” data from a primary storage of the distributed processing system to a backup storage thereby maximizing the usage of the space on the primary storage to store “hot” data on which most data processing activities are performed in the distributed processing system. The cold data is retrieved from the backup storage into the primary storage on demand, for example, upon receiving an access request from a client. While the primary storage stores the data in a format specific to the distributed processing system, the backup storage stores the data in a different format, for example, format corresponding to the type of backup storage.
Abstract:
Technology is disclosed for managing data in a distributed file system (“the technology”). The technology can gather metadata information associated with the data stored within the distributed file system, create a secondary namespace within a local file system of a local host using the gathered metadata information and store the gathered metadata information as files within the secondary namespace. Further, when a request to create a PPI of the distributed file system is received, the technology can create a PPI of the secondary namespace using a PPI creation feature of the local file system.