摘要:
Embodiments of the present invention provide a database system that is optimized by using hardware acceleration. The system may be implemented in several variations to accommodate a wide range of queries and database sizes. In some embodiments, the system may comprise a host system that is coupled to one or more hardware accelerator components. The host system may execute software or provide an interface for receiving queries. The host system analyzes and parses these queries into tasks. The host system may then select some of the tasks and translate them into machine code instructions, which are executed by one or more hardware accelerator components. The tasks executed by hardware accelerators are generally those tasks that may be repetitive or processing intensive. Such tasks may include, for example, indexing, searching, sorting, table scanning, record filtering, and the like.
摘要:
A programmable machine system and method for managing electronic data access among multiple different relational databases in a network distributed database environment. The machine is programmed so that it can construct cost-effective access strategies for any of the participating databases absent any DBMS-specific cost models. The system provides query optimization across different database management systems in a network distributed database environment based on a calibrating database relying only on typical relational database statistics and cost data is developed by running queries in the various databases against the calibrating database. A logical cost model is constructed using the resulting cost data and is used to estimate the cost of a given query based on logical characteristics of the DBMS, the relations, and the query itself. The cost of a complex query is estimated using primitive queries. Optimal query access strategies are thereby designed and used to control execution of the queries across relational databases controlled by two or more different database management systems.
摘要:
Embodiments of the present invention provide a run-time scheduler that schedules tasks for database queries on one or more execution resources in a dataflow fashion. In some embodiments, the run-time scheduler may comprise a task manager, a memory manager, and hardware resource manager. When a query is received by a host database management system, a query plan is created for that query. The query plan splits a query into various fragments. These fragments are further compiled into a directed acyclic graph of tasks. Unlike conventional scheduling, the dependency arc in the directed acyclic graph is based on page resources. Tasks may comprise machine code that may be executed by hardware to perform portions of the query. These tasks may also be performed in software or relate to I/O.
摘要:
Embodiments of the present invention provide one or more hardware-friendly data structures that enable efficient hardware acceleration of database operations. In particular, the present invention employs a column-store format for the database. In the database, column-groups are stored with implicit row ids (RIDs) and a RID-to-primary key column having both column-store and row-store benefits via column hopping and a heap structure for adding new data. Fixed-width column compression allow for easy hardware database processing directly on the compressed data. A global database virtual address space is utilized that allows for arithmetic derivation of any physical address of the data regardless of its location. A word compression dictionary with token compare and sort index is also provided to allow for efficient hardware-based searching of text. A tuple reconstruction process is provided as well that allows hardware to reconstruct a row by stitching together data from multiple column groups.
摘要:
Embodiments of the present invention provide one or more hardware-friendly data structures that enable efficient hardware acceleration of database operations. In particular, the present invention employs a column-store format for the database. In the database, column-groups are stored with implicit row ids (RIDs) and a RID-to-primary key column having both column-store and row-store benefits via column hopping and a heap structure for adding new data. Fixed-width column compression allow for easy hardware database processing directly on the compressed data. A global database virtual address space is utilized that allows for arithmetic derivation of any physical address of the data regardless of its location. A word compression dictionary with token compare and sort index is also provided to allow for efficient hardware-based searching of text. A tuple reconstruction process is provided as well that allows hardware to reconstruct a row by stitching together data from multiple column groups.
摘要:
Embodiments of the present invention provide for batch and incremental loading of data into a database. In the present invention, the loader infrastructure utilizes machine code database instructions and hardware acceleration to parallelize the load operations with the I/O operations. A large, hardware accelerator memory is used as staging cache for the load process. The load process also comprises an index profiling phase that enables balanced partitioning of the created indexes to allow for pipelined load. The online incremental loading process may also be performed while serving queries.
摘要:
Embodiments of the present invention provide fine grain concurrency control for transactions in the presence of database updates. During operations, each transaction is assigned a snapshot version number or SVN. A SVN refers to a historical snapshot of the database that can be created periodically or on demand. Transactions are thus tied to a particular SVN, such as, when the transaction was created. Queries belonging to the transactions can access data that is consistent as of a point in time, for example, corresponding to the latest SVN when the transaction was created. At various times, data from the database stored in a memory can be updated using the snapshot data corresponding to a SVN. When a transaction is committed, a snapshot of the database with a new SVN is created based on the data modified by the transaction and the snapshot is synchronized to the memory. When a transaction query requires data from a version of the database corresponding to a SVN, the data in the memory may be synchronized with the snapshot data corresponding to that SVN.
摘要:
Embodiments of the present invention provide a run-time scheduler that schedules tasks for database queries on one or more execution resources in a dataflow fashion. In some embodiments, the run-time scheduler may comprise a task manager, a memory manager, and hardware resource manager. When a query is received by a host database management system, a query plan is created for that query. The query plan splits a query into various fragments. These fragments are further compiled into a directed acyclic graph of tasks. Unlike conventional scheduling, the dependency arc in the directed acyclic graph is based on page resources. Tasks may comprise machine code that may be executed by hardware to perform portions of the query. These tasks may also be performed in software or relate to I/O.
摘要:
Systems, architectures, and data structures are described which are used to manage distributed design chains, specifically for domains in which data reside in multiple applications and are linked through complex interrelationships. The design chains or design networks integrated by the invention may include multiple companies in multiple sites collaborating to design and develop a new product. The invention is intended to integrate seamlessly and transparently with existing, diverse legacy applications, which include inter-linked data relevant to the design, thereby addressing the needs identified above.
摘要:
Systems, architectures, and data structures are described which are used to manage distributed design chains, specifically for domains in which data reside in multiple applications and are linked through complex interrelationships. The design chains or design networks integrated by the invention may include multiple companies in multiple sites collaborating to design and develop a new product. The invention is intended to integrate seamlessly and transparently with existing, diverse legacy applications, which include inter-linked data relevant to the design, thereby addressing the needs identified above.