摘要:
A system from processing database queries allows for cost and locale based distribution for execution of database queries. The database queries are executed on execution engines that provide flexible configuration and overlapping functionality. The system reduces various costs, including elapsed time, required to perform database queries. The system provides processing of a database query using a database catalog comprising database table locality information, record locality information and execution engine information. A query optimizer receives the query and accesses the database catalog to create a query execution plan comprising locality-based database operations. A central database operation processor providing a first execution engine executes the query execution plan by performing at least a portion of the locality-based database operations and distributing at least a portion of the locality-based database operations as a subplan. A second database operation processor providing a second execution engine executes the subplan received from the central database operation processor. At least one of the database operations can be executed on either the first execution engine or the second execution engine. A storage unit stores at least a portion of database tables and records. A data communications network connects the central database processor to the second database processor.
摘要:
A large information space is divided into many smaller information extents. These extents are annotated with statistics about the information they contain. When a search for information includes a restriction based on value, the desired value ranges can be compared to the value ranges of each extent. If the desired value range lies outside the range of the extent, then the extent cannot hold the desired value and does not need to be included in the search.
摘要:
A system and method from processing database queries allows for cost and locale based distribution for execution of database queries. The database queries are executed on execution engines that provide flexible configuration and overlapping functionality. The system reduces various costs, including elapsed time, required to perform database queries. The method provides processing of a database query using a database catalog comprising database table locality information, record locality information and execution engine information. A query optimizer receives the query and accesses the catalog to create a query execution plan comprising locality-based database operations. A central database operation processor providing a first execution engine executes the query execution plan by performing at least a portion of the locality-based database operations and distributing at least a portion of the locality-based database operations as a subplan. A second database operation processor providing a second execution engine executes the subplan received from the central processor.
摘要:
An asymmetric data record processor and method includes host computers and Job processing units (JPU's) coupled together on a network. Each host computer and JPU forms a node on the network. A plurality of software operators allow each node to process streams of records. For each operator in a given sequence within nodes and across nodes, output of the operator is input to a respective succeeding operator. Data processing follows a logical data flow based on readiness of a record. As soon as a record is ready it is passed for processing from one part to a next part in the logical data flow. The flow of records during data processing is substantially continuous and of a streaming fashion.
摘要:
A disk is segmented into a first data segment and a secondary data segment. The secondary data segment stores a logical mirror of the first data segment of another disk. Fast access to data stored on the disk is provided by partitioning the disk such that the first data segment includes the fast tracks of the disk and the secondary data segment includes the slow tracks of the disk and forwarding all data requests to the first data segment. Upon detecting a failure, the logical mirror of data stored in the first data segment of the failed disk is accessible from the secondary data segment of a non-failed disk. The first data segment can be rebuilt quickly on another disk from the logical mirror stored in the secondary data segment.
摘要:
A disk is segmented into a first data segment and a secondary data segment. The secondary data segment stores a logical mirror of the first data segment of another disk. Upon detecting a failure, the logical mirror of data stored in the first data segment of the failed disk is accessible from the secondary data segment of a non-failed disk. The first data segment can be rebuilt quickly on another disk from the logical mirror stored in the secondary data segment. During regenerating, accesses to the first data segment on the disk containing the logical mirror are handled by its own logical mirror, which is not involved in the regenerating process.
摘要:
A system from processing database queries allows for cost and locale based distribution for execution of database queries. The database queries are executed on execution engines that provide flexible configuration and overlapping functionality. The system reduces various costs, including elapsed time, required to perform database queries. The system provides processing of a database query using a database catalog comprising database table locality information, record locality information and execution engine information. A query optimizer receives the query and accesses the database catalog to create a query execution plan comprising locality-based database operations. A central database operation processor providing a first execution engine executes the query execution plan by performing at least a portion of the locality-based database operations and distributing at least a portion of the locality-based database operations as a subplan. A second database operation processor providing a second execution engine executes the subplan received from the central database operation processor. At least one of the database operations can be executed on either the first execution engine or the second execution engine. A storage unit stores at least a portion of database tables and records. A data communications network connects the central database processor to the second database processor.
摘要:
A system and method from processing database queries allows for cost and locale based distribution for execution of database queries. The database queries are executed on execution engines that provide flexible configuration and overlapping functionality. The system reduces various costs, including elapsed time, required to perform database queries. The method provides processing of a database query using a database catalog comprising database table locality information, record locality information and execution engine information. A query optimizer receives the query and accesses the catalog to create a query execution plan comprising locality-based database operations. A central database operation processor providing a first execution engine executes the query execution plan by performing at least a portion of the locality-based database operations and distributing at least a portion of the locality-based database operations as a subplan. A second database operation processor providing a second execution engine executes the subplan received from the central processor.
摘要:
A data processing system having two or more groups of data processors that have attributes that are optimized for their assigned functions. A first group consists of one or more host computers responsible for interfacing with applications and/or end users to obtain queries and for planning query execution. A second processor group consists of many streaming record-oriented processors called Job Processing Units (JPUs), preferably arranged as an MPP structure. The JPUs typically carry out the bulk of the data processing required to implement the logic of a query. Each of the JPUs typically include a general purpose microcomputer, local memory, one or more mass storage devices, and one or more network connections. Each JPU also has a special purpose programmable processor, referred to herein as a Programmable Streaming Data Processor (PSDP). The PSDP serves as an interface between the CPU of a JPU and the mass storage device, to offload functions from the CPU of the JPU.
摘要:
A programmable streaming data processor that can be programmed to recognize record and field structures of data received from a streaming data source such as a mass storage device. Being programmed with, for example, field information, the unit can locate record and field boundaries and employ logical arithmetic methods to compare fields with one another or with values otherwise supplied by general purpose processors to precisely determine which records are worth transferring to memory of the more general purpose distributed processors. The remaining records arrive and are discarded by the streaming data processor or are tagged with status bits to indicate to the more general purpose processor that they are to be ignored. In a preferred embodiment, the streaming data processor may analyze and discard records for several reasons. The first reason may be an analysis of contents of the field. Other reasons for record blocking may have to do with tagging records that are to be visible to particular users depending upon a series of concurrent transactions.