Abstract:
Generating a transformed dataset for use by a machine learning model in an artificial intelligence infrastructure that includes one or more storage systems and one or more graphical processing unit (‘GPU’) servers, including: storing, within one or more storage systems, a transformed dataset generated by applying one or more transformations to a dataset that are identified based on one or more expected input formats of data received as input data by one or more machine learning models to be executed on one or more servers; and transmitting, from the one or more storage systems to the one or more servers without reapplying the one or more transformations on the dataset, the transformed dataset including data in the one or more expected formats of data to be received as input data by the one or more machine learning models.
Abstract:
Data transformation caching in an artificial intelligence infrastructure that includes one or more storage systems and one or more graphical processing unit (‘GPU’) servers, including: identifying, in dependence upon one or more machine learning models to be executed on the GPU servers, one or more transformations to apply to a dataset; generating, in dependence upon the one or more transformations, a transformed dataset; storing, within one or more of the storage systems, the transformed dataset; receiving a plurality of requests to transmit the transformed dataset to one or more of the GPU servers; and responsive to each request, transmitting, from the one or more storage systems to the one or more GPU servers without re-performing the one or more transformations on the dataset, the transformed dataset.
Abstract:
Batch building for artificial intelligence workflows, including: issuing, responsive to a request for a batch of data objects, requests to a data repository for multiple data objects stored among one or more directories; selecting, in accordance with a batch building policy, a subset of data objects based on one or more responses to the requests; and providing, to the artificial intelligence workflow, a batch of data objects that includes the subset of data objects.
Abstract:
A plurality of storage nodes is provided. Each of the plurality of storage nodes includes nonvolatile solid-state memory for user data storage. The plurality of storage nodes is configured to distribute the user data and metadata associated with the user data throughout the plurality of storage nodes such that the plurality of storage nodes maintain the ability to read the user data, using erasure coding, despite a loss of two of the plurality of storage nodes. The plurality of storage nodes is configured to initiate an action based on the redundant copies of the metadata, responsive to achieving a level of redundancy for the redundant copies of the metadata. A method for accessing user data in a plurality of storage nodes having nonvolatile solid-state memory is also provided.
Abstract:
A method of applying a data format in a direct memory access transfer is provided. The method includes distributing user data throughout a plurality of storage nodes through erasure coding, wherein the plurality of storage nodes are housed within a single chassis that couples the storage nodes as a cluster, each of the plurality of storage nodes having nonvolatile solid-state memory for user data storage. The method includes reading a self-describing data portion from a first memory of the nonvolatile solid-state memory and extracting a destination from the self-describing data portion. The method includes writing data, from the self-describing data portion, to a second memory of the nonvolatile solid-state memory according to the destination.
Abstract:
A method of applying a data format in a direct memory access transfer is provided. The method includes distributing user data throughout a plurality of storage nodes through erasure coding, wherein the plurality of storage nodes are housed within a single chassis that couples the storage nodes as a cluster, each of the plurality of storage nodes having nonvolatile solid-state memory for user data storage. The method includes reading a self-describing data portion from a first memory of the nonvolatile solid-state memory and extracting a destination from the self-describing data portion. The method includes writing data, from the self-describing data portion, to a second memory of the nonvolatile solid-state memory according to the destination.
Abstract:
A method of applying a data format in a direct memory access transfer is provided. The method includes distributing user data throughout a plurality of storage nodes through erasure coding, wherein the plurality of storage nodes are housed within a single chassis that couples the storage nodes as a cluster, each of the plurality of storage nodes having nonvolatile solid-state memory for user data storage. The method includes reading a self-describing data portion from a first memory of the nonvolatile solid-state memory and extracting a destination from the self-describing data portion. The method includes writing data, from the self-describing data portion, to a second memory of the nonvolatile solid-state memory according to the destination.
Abstract:
A method for storing data in a storage system having solid-state memory is provided. The method includes determining portions of the solid-state memory that have a faster access rate and portions of the solid-state memory that have a slower access rate, relative to each other or to a threshold. The method includes writing data bits of erasure coded data to the portions of the solid-state memory having the faster access rate, and writing one or more parity bits of the erasure coded data to the portions of the solid-state memory having the slower access rate. A storage system is also provided.
Abstract:
A method for storing data in a storage system having solid-state memory is provided. The method includes determining portions of the solid-state memory that have a faster access rate and portions of the solid-state memory that have a slower access rate, relative to each other or to a threshold. The method includes writing data bits of erasure coded data to the portions of the solid-state memory having the faster access rate, and writing one or more parity bits of the erasure coded data to the portions of the solid-state memory having the slower access rate. A storage system is also provided.
Abstract:
A method of applying scheduling policies is provided. The method includes distributing user data throughout a plurality of storage nodes through erasure coding, wherein the plurality of storage nodes are housed within a single chassis coupling the storage nodes as a cluster. The method includes receiving operations relating to a non-volatile memory of one of the plurality of storage nodes into a plurality of operation queues. The method includes evaluating each of the operations in the plurality of operation queues as to benefit to the non-volatile solid-state storage according to a plurality of policies. For each channel of a plurality of channels coupling the operation queues to the non-volatile memory, the method includes iterating a selection and an execution of a next operation from the plurality of operation queues, with each next operation having a greater benefit than at least a subset of operations remaining in the operation queues.