Abstract:
Techniques for optimizing a query with an extrema function are provided. In main memory, a data summary is maintained for a plurality of extents stored by at least one storage server. The data summary includes an extent minimum value and an extent maximum value for one or more columns. A storage server request is received, from a database server, based on a query with an extrema function applied to a particular column of a particular table. The data summaries for a set of relevant extents are processed by maintaining at least one global extrema value corresponding to the extrema function and, for each relevant extent of the set of relevant extents, determining whether to scan records of the relevant extent based on at least one of the global extrema value and an extent summary value of the data summary of the relevant extent.
Abstract:
Techniques herein are for creating a database snapshot by creating a sparse database. A method involves receiving a creation request to create a sparse database. The creation request has an identity of a parent database. The creation request is processed to create a sparse database. The sparse database has the identity of the parent database. The sparse database does not contain data copied from the parent database. A write request to write data into the sparse database is received. The write request is processed by writing the data into the sparse database. The parent database does not receive the data.
Abstract:
A method and apparatus for intelligent network resource manager for distributed computing systems is provided. A first priority is assigned to a first virtual channel set that includes at least two virtual channels of a plurality of virtual channels associated with a physical communication channel. A second priority is assigned to a second virtual channel set that includes at least one virtual channel of the plurality of virtual channels. The first virtual channel set has more virtual channels than the second virtual channel set. Outbound messages of the first priority are directed to virtual channels of the first virtual channel set. Outbound messages of the second priority are directed to virtual channels of the second virtual channel set. The virtual channels are processed in a round-robin order, where processing includes sending the outbound messages over the physical communication channel.
Abstract:
Techniques are provided for using an intermediate cache between the shared cache of an application and the non-volatile storage of a storage system. The application may be any type of application that uses a storage system to persistently store data. The intermediate cache may be local to the machine upon which the application is executing, or may be implemented within the storage system. In one embodiment where the application is a database server, the database system includes both a DB server-side intermediate cache, and a storage-side intermediate cache. The caching policies used to populate the intermediate cache are intelligent, taking into account factors that may include which object an item belongs to, the item type of the item, a characteristic of the item, or the type of operation in which the item is involved.
Abstract:
A method and apparatus for intelligent network resource manager for distributed computing systems is provided. A first priority is assigned to a first virtual channel set that includes at least two virtual channels of a plurality of virtual channels associated with a physical communication channel. A second priority is assigned to a second virtual channel set that includes at least one virtual channel of the plurality of virtual channels. The first virtual channel set has more virtual channels than the second virtual channel set. Outbound messages of the first priority are directed to virtual channels of the first virtual channel set. Outbound messages of the second priority are directed to virtual channels of the second virtual channel set. The virtual channels are processed in a round-robin order, where processing includes sending the outbound messages over the physical communication channel.
Abstract:
Techniques are provided for managing, within a storage system, the sequence in which I/O requests are processed by the storage system based, at least in part, on one or more logical characteristics of the I/O requests. The logical characteristics may include, for example, the identity of the user for whom the I/O request was submitted, the service that submitted the I/O request, the database targeted by the I/O request, an indication of a consumer group to which the I/O request maps, the reason why the I/O request was issued, a priority category of the I/O request, etc. Techniques are also provided for automatically establishing a scheduling policy within a storage system, and for dynamically changing the scheduling policy in response to changes in workload.
Abstract:
Techniques are provided for using an intermediate cache to provide some of the items involved in a scan operation, while other items involved in the scan operation are provided from primary storage. Techniques are also provided for determining whether to service an I/O request for an item with a copy of the item that resides in the intermediate cache based on factors such as a) an identity of the user for whom the I/O request was submitted, b) an identity of a service that submitted the I/O request, c) an indication of a consumer group to which the I/O request maps, or d) whether the intermediate cache is overloaded. Techniques are also provided for determining whether to store items in an intermediate cache in response to the items being retrieved, based on logical characteristics associated with the requests that retrieve the items.
Abstract:
A storage system communicatively coupled to a database management system (DBMS performs storage-side scanning of data sources that are not stored in native database storage format of the DBMS. Data sources for external tables are accessible in a storage system referred to as a distributed data access system (DDAS), e.g. a Hadoop Distributed File System. To execute a query that references an external table, a DBMS first generates an execution plan. The DDAS supplies the DBMS with information that specifies each portion of the data source, and specifies which data node to use to access the portion. The DBMS sends a request for each portion to the respective data node, requesting that the data node generate rows from data in the portion. The request may specify scanning criteria, specifying one or more columns to project and/or filter on, and code modules for the data node to execute to generate records.
Abstract:
In a write by-peer-reference, a storage device client writes a data block to a target storage device in the storage system by sending a write request to the target storage device, the write request specifying information used to obtain the data block from a source storage device in the storage system. The target storage device sends a read request to the source storage device for the data block. The source storage device sends the data block to the target storage device, which then writes the data block to the target storage device. The data block is thus written to the target storage device without the storage device client transmitting the data block itself to the target storage device.
Abstract:
A storage system communicatively coupled to a DBMS performs storage-side scanning of data sources that are not stored in the native database storage format of the DBMS. Data sources for external tables are accessible in a storage system referred to herein as a distributed data access system, e.g. a Hadoop Distributed File System. To execute a query that references an external table, a DBMS first generates an execution plan. The distributed data access system supplies the DBMS with information that specifies each portion of the data source, and specifies which data node to use to access the portion. The DBMS sends a request for each portion to the respective data node, the request requesting that the data node generate rows from data in the portion. The request may specify scanning criteria, specifying one or more columns to project and/or filter on. The request may also specify code modules for the data node to execute to generate rows or records and columns.