摘要:
Embodiments of the present invention provide one or more hardware-friendly data structures that enable efficient hardware acceleration of database operations. In particular, the present invention employs a column-store format for the database. In the database, column-groups are stored with implicit row ids (RIDs) and a RID-to-primary key column having both column-store and row-store benefits via column hopping and a heap structure for adding new data. Fixed-width column compression allow for easy hardware database processing directly on the compressed data. A global database virtual address space is utilized that allows for arithmetic derivation of any physical address of the data regardless of its location. A word compression dictionary with token compare and sort index is also provided to allow for efficient hardware-based searching of text. A tuple reconstruction process is provided as well that allows hardware to reconstruct a row by stitching together data from multiple column groups.
摘要:
Embodiments of the present invention provide one or more hardware-friendly data structures that enable efficient hardware acceleration of database operations. In particular, the present invention employs a column-store format for the database. In the database, column-groups are stored with implicit row ids (RIDs) and a RID-to-primary key column having both column-store and row-store benefits via column hopping and a heap structure for adding new data. Fixed-width column compression allow for easy hardware database processing directly on the compressed data. A global database virtual address space is utilized that allows for arithmetic derivation of any physical address of the data regardless of its location. A word compression dictionary with token compare and sort index is also provided to allow for efficient hardware-based searching of text. A tuple reconstruction process is provided as well that allows hardware to reconstruct a row by stitching together data from multiple column groups.
摘要:
Embodiments of the present invention provide processing elements that are capable of performing high level database operations in hardware based on machine code instructions. These processing elements employ a dataflow architecture that operates on data in hardware without interruption or software. A scanning/indexing processing element may comprise logic that analyze database column groups stored in local memory, perform parallel field extraction and comparison, and generates a list of row pointers (row ids or RIDs) referencing those rows whose value(s) satisfy an applied predicate. The scanning/indexing processing may also be used to project database column groups, search and join index structures, and manipulate in-flight metadata flows, composing, merging, reducing, and modifying multi-dimensional lists of intermediate and final results. Furthermore, a scanning/indexing processing element may be used for joins with indexes, like a Group Index, which involves the association of each input tuple with potentially many related data components, in a one-to-many mapping. An XCAM processing element may comprise logic to perform associative database operations, like accumulation and aggregation, sieving, sorting and associative joins.
摘要:
Systems, architectures, and data structures are described which are used to manage distributed design chains, specifically for domains in which data reside in multiple applications and are linked through complex interrelationships. The design chains or design networks integrated by the invention may include multiple companies in multiple sites collaborating to design and develop a new product. The invention is intended to integrate seamlessly and transparently with existing, diverse legacy applications, which include inter-linked data relevant to the design, thereby addressing the needs identified above.
摘要:
Embodiments of the present invention provide fine grain concurrency control for transactions in the presence of database updates. During operations, each transaction is assigned a snapshot version number or SVN. A SVN refers to a historical snapshot of the database that can be created periodically or on demand. Transactions are thus tied to a particular SVN, such as, when the transaction was created. Queries belonging to the transactions can access data that is consistent as of a point in time, for example, corresponding to the latest SVN when the transaction was created. At various times, data from the database stored in a memory can be updated using the snapshot data corresponding to a SVN. When a transaction is committed, a snapshot of the database with a new SVN is created based on the data modified by the transaction and the snapshot is synchronized to the memory. When a transaction query requires data from a version of the database corresponding to a SVN, the data in the memory may be synchronized with the snapshot data corresponding to that SVN.
摘要:
Embodiments of the present invention generate and optimize query plans that are at least partially executable in hardware. Upon receiving a query, the query is rewritten and optimized with a bias for hardware execution of fragments of the query. A template-based algorithm may be employed for transforming a query into fragments and then into query tasks. The various query tasks can then be routed to either a hardware accelerator, a software module, or sent back to a database management system for execution. For those tasks routed to the hardware accelerator, the query tasks are compiled into machine code database instructions. In order to optimize query execution, query tasks may be broken into subtasks, rearranged based on available resources of the hardware, pipelined, or branched conditionally
摘要:
Embodiments of the present invention provide for batch and incremental loading of data into a database. In the present invention, the loader infrastructure utilizes machine code database instructions and hardware acceleration to parallelize the load operations with the I/O operations. A large, hardware accelerator memory is used as staging cache for the load process. The load process also comprises an index profiling phase that enables balanced partitioning of the created indexes to allow for pipelined load. The online incremental loading process may also be performed while serving queries.
摘要:
Embodiments of the present invention provide hardware-friendly indexing of databases. In particular, forward and reverse indexing are utilized to allow for easy traversal of primary key to foreign key relationships. A novel structure known as a hit list also allows for easy scanning of various indexes in hardware. Group indexing is provided for flexible support of complex group key definition, such as for date range indexing and text indexing. A Replicated Reordered Column (RRC) may also be added to the group index to convert random I/O pattern into sequential I/O of only needed column elements.
摘要:
Embodiments of the present invention provide a database system that is optimized by using hardware acceleration. The system may be implemented in several variations to accommodate a wide range of queries and database sizes. In some embodiments, the system may comprise a host system that is coupled to one or more hardware accelerator components. The host system may execute software or provide an interface for receiving queries. The host system analyzes and parses these queries into tasks. The host system may then select some of the tasks and translate them into machine code instructions, which are executed by one or more hardware accelerator components. The tasks executed by hardware accelerators are generally those tasks that may be repetitive or processing intensive. Such tasks may include, for example, indexing, searching, sorting, table scanning, record filtering, and the like.
摘要:
Embodiments of the present invention provide for batch and incremental loading of data into a database. In the present invention, the loader infrastructure utilizes machine code database instructions and hardware acceleration to parallelize the load operations with the I/O operations. A large, hardware accelerator memory is used as staging cache for the load process. The load process also comprises an index profiling phase that enables balanced partitioning of the created indexes to allow for pipelined load. The online incremental loading process may also be performed while serving queries.