Abstract:
Techniques herein store database blocks (DBBs) in byte-addressable persistent memory (PMEM) and prevent tearing without deadlocking or waiting. In an embodiment, a computer hosts a DBMS. A reader process of the DBMS obtains, without locking and from metadata in PMEM, a first memory address for directly accessing a current version, which is a particular version, of a DBB in PMEM. Concurrently and without locking: a) the reader process reads the particular version of the DBB in PMEM, and b) a writer process of the DBMS replaces, in the metadata in PMEM, the first memory address with a second memory address for directly accessing a new version of the DBB in PMEM. In an embodiment, a computer performs without locking: a) storing, in PMEM, a DBB, b) copying into volatile memory, or reading, an image of the DBB, and c) detecting whether the image of the DBB is torn.
Abstract:
Techniques are provided to allow more sophisticated operations to be performed remotely by machines that are not fully functional. Operations that can be performed reliably by a machine that has experienced a hardware and/or software error are referred to herein as Remote Direct Memory Operations or “RDMOs”. Unlike RDMAs, which typically involve trivially simple operations such as the retrieval of a single value from the memory of a remote machine, RDMOs may be arbitrarily complex. The techniques described herein can help applications run without interruption when there are software faults or glitches on a remote system with which they interact.
Abstract:
Techniques are provided to allow more sophisticated operations to be performed remotely by machines that are not fully functional. Operations that can be performed reliably by a machine that has experienced a hardware and/or software error are referred to herein as Remote Direct Memory Operations or “RDMOs”. Unlike RDMAs, which typically involve trivially simple operations such as the retrieval of a single value from the memory of a remote machine, RDMOs may be arbitrarily complex. The techniques described herein can help applications run without interruption when there are software faults or glitches on a remote system with which they interact.
Abstract:
Distributed ledgered data is stored within a distributed persistent storage system comprising multiple persistent storage systems as distributed ledgered participants. In various embodiments, the distributed ledgered data is maintained using the native capabilities of a persistent storage system. The distributed ledgered data is replicated as persistent data objects in a “ledgered repository of objects” that are replicated at each of the persistent storage systems. Changes at one persistent storage system are recorded within a block in a distributed blockchain that is distributed across each of the other distributed ledgered participants. The other distributed ledgered participants read the changes from the blockchain and apply the changes to the respective replicas at each of the other distributed ledgered participants. Hence, this approach is referred to as blockchain apply. Blockchain apply may be used to replicate the repository objects of various forms of PSSs. In a DBMS, a repository of objects is a table, where each record or row is an object in the repository. In a file system, a repository of objects is a directory, where each directory and file therein is an object in the repository. In a document storage system (DOCS), a repository of objects is a collection of documents, where each document is an object in the repository.
Abstract:
A method and apparatus for implementing a buffer cache for a persistent file system in non-volatile memory is provided. A set of data is maintained in one or more extents in non-volatile random-access memory (NVRAM) of a computing device. At least one buffer header is allocated in dynamic random-access memory (DRAM) of the computing device. In response to a read request by a first process executing on the computing device to access one or more first data blocks in a first extent of the one or more extents, the first process is granted direct read access of the first extent in NVRAM. A reference to the first extent in NVRAM is stored in a first buffer header. The first buffer header is associated with the first process. The first process uses the first buffer header to directly access the one or more first data blocks in NVRAM.
Abstract:
Data blocks are cached in a persistent cache (“NV cache”) allocated from as non-volatile RAM (“NVRAM”). The data blocks may be accessed in place in the NV cache of a “source” computing element by another “remote” computing element over a network using remote direct memory access (“RMDA”). In order for a remote computing element to access the data block in NV cache on a source computing element, the remote computing element needs the memory address of the data block within the NV cache. For this purpose, a hash table is stored and maintained in RAM on the source computing element. The hash table identifies the data blocks in the NV cache and specifies a location of the cached data block within the NV cache.
Abstract:
Techniques related to efficient data storage and retrieval using a heterogeneous main memory are disclosed. A database includes a set of persistent format (PF) data that is stored on persistent storage in a persistent format. The database is maintained on the persistent storage and is accessible to a database server. The database server converts the set of PF data to sets of mirror format (MF) data and stores the MF data in a hierarchy of random-access memories (RAMs). Each RAM in the hierarchy has an associated latency that is different from a latency associated with any other RAM in the hierarchy. Storing the sets of MF data in the hierarchy of RAMs includes (1) selecting, based on one or more criteria, a respective RAM in the hierarchy to store each set of MF data and (2) storing said each set of MF data in the respective RAM.
Abstract:
Techniques are provided for using an intermediate cache between the shared cache of an application and the non-volatile storage of a storage system. The application may be any type of application that uses a storage system to persistently store data. The intermediate cache may be local to the machine upon which the application is executing, or may be implemented within the storage system. In one embodiment where the application is a database server, the database system includes both a DB server-side intermediate cache, and a storage-side intermediate cache. The caching policies used to populate the intermediate cache are intelligent, taking into account factors that may include which object an item belongs to, the item type of the item, a characteristic of the item, or the type of operation in which the item is involved.
Abstract:
Techniques are provided for managing, within a storage system, the sequence in which I/O requests are processed by the storage system based, at least in part, on one or more logical characteristics of the I/O requests. The logical characteristics may include, for example, the identity of the user for whom the I/O request was submitted, the service that submitted the I/O request, the database targeted by the I/O request, an indication of a consumer group to which the I/O request maps, the reason why the I/O request was issued, a priority category of the I/O request, etc. Techniques are also provided for automatically establishing a scheduling policy within a storage system, and for dynamically changing the scheduling policy in response to changes in workload.
Abstract:
Techniques are provided for using an intermediate cache to provide some of the items involved in a scan operation, while other items involved in the scan operation are provided from primary storage. Techniques are also provided for determining whether to service an I/O request for an item with a copy of the item that resides in the intermediate cache based on factors such as a) an identity of the user for whom the I/O request was submitted, b) an identity of a service that submitted the I/O request, c) an indication of a consumer group to which the I/O request maps, or d) whether the intermediate cache is overloaded. Techniques are also provided for determining whether to store items in an intermediate cache in response to the items being retrieved, based on logical characteristics associated with the requests that retrieve the items.