摘要:
Embodiments of the present invention provide a hardware accelerator that assists a host database system in processing its queries. The hardware accelerator comprises special purpose processing elements that are capable of receiving database query/operation tasks in the form of machine code database instructions, execute them in hardware without software, and return the query/operation result back to the host system. For example, table and column descriptors are embedded in the machine code database instructions. For ease of installation, the hardware accelerators employ a standard interconnect, such as a PCle or HT interconnect. The processing elements implement a novel dataflow design and Inter Macro-Op Communication (IMC) data structures to execute the machine code database instructions. The hardware accelerator may also comprise a relatively large memory to enhance the hardware execution of the query/operation tasks requested. The hardware accelerator utilizes hardware-friendly memory addressing, which allows for arithmetic derivation of a physical address from a global database virtual address simply based on a row identifier. The hardware accelerator minimizes memory reads/writes by keeping most intermediate results flowing through IMCs in pipelined and parallel fashion. Furthermore, the hardware accelerator may employ task pipelining and pre-fetch pipelining to enhance its performance.
摘要:
Embodiments of the present invention provide a hardware accelerator that assists a host database system in processing its queries. The hardware accelerator comprises special purpose processing elements that are capable of receiving database query/operation tasks in the form of machine code database instructions, execute them in hardware without software, and return the query/operation result back to the host system. For example, table and column descriptors are embedded in the machine code database instructions. For ease of installation, the hardware accelerators employ a standard interconnect, such as a PCIe or HT interconnect. The processing elements implement a novel dataflow design and Inter Macro-Op Communication (IMC) data structures to execute the machine code database instructions. The hardware accelerator may also comprise a relatively large memory to enhance the hardware execution of the query/operation tasks requested. The hardware accelerator utilizes hardware-friendly memory addressing, which allows for arithmetic derivation of a physical address from a global database virtual address simply based on a row identifier. The hardware accelerator minimizes memory reads/writes by keeping most intermediate results flowing through IMCs in pipelined and parallel fashion. Furthermore, the hardware accelerator may employ task pipelining and pre-fetch pipelining to enhance its performance.
摘要:
Embodiments of the present invention provide a hardware accelerator that assists a host database system in processing its queries. The hardware accelerator comprises special purpose processing elements that are capable of receiving database query/operation tasks in the form of machine code database instructions, execute them in hardware without software, and return the query/operation result back to the host system. For example, table and column descriptors are embedded in the machine code database instructions. For ease of installation, the hardware accelerators employ a standard interconnect, such as a PCIe or HT interconnect. The processing elements implement a novel dataflow design and Inter Macro-Op Communication (IMC) data structures to execute the machine code database instructions. The hardware accelerator may also comprise a relatively large memory to enhance the hardware execution of the query/operation tasks requested. The hardware accelerator utilizes hardware-friendly memory addressing, which allows for arithmetic derivation of a physical address from a global database virtual address simply based on a row identifier. The hardware accelerator minimizes memory reads/writes by keeping most intermediate results flowing through IMCs in pipelined and parallel fashion. Furthermore, the hardware accelerator may employ task pipelining and pre-fetch pipelining to enhance its performance.
摘要:
A multi-level content addressable memory (CAM) architecture compresses out much of the redundancy encountered in the search space of a single CAM, particularly for flow-based lookups in a network. Destination and source address may be associated with internal equivalence classes independently in one level of the multi-level CAM architecture, while flow-specific properties linking arbitrary classes of the destination and source addresses may be applied in a later level of the multi-level CAM.
摘要:
Divisions by numbers that are not divisible by two (2) can be performed in a computing system based on a summation that estimates and/or approximates the reciprocal of the dividing number or denominator value. By way of example, dividing by three (3) can be calculated based on a summation that approximates or estimates one third (⅓) represented as the sum of a selected group of the inverses of the powers of two (2) in a pattern, namely the sum of: ¼, 1/16, 1/64, 1/256, . . . ). Applications of the division techniques are virtually unlimited and include memory mapping of global memory addresses to memory channel addresses by dividing a global memory address into the number of memory channels, allowing memory mapping to be performed in an efficient manner even for large memory spaces using a number of memory channels that are not divisible by two, including prime numbers.
摘要:
Divisions by numbers that are not divisible by two (2) can be performed in a computing system based on a summation that estimates and/or approximates the reciprocal of the dividing number or denominator value. By way of example, dividing by three (3) can be calculated based on a summation that approximates or estimates one third (⅓) represented as the sum of a selected group of the inverses of the powers of two (2) in a pattern, namely the sum of: ¼, 1/16, 1/64, 1/256, . . . ). Applications of the division techniques are virtually unlimited and include memory mapping of global memory addresses to memory channel addresses by dividing a global memory address into the number of memory channels, allowing memory mapping to be performed in an efficient manner even for large memory spaces using a number of memory channels that are not divisible by two, including prime numbers.
摘要:
A multi-level content addressable memory (CAM) architecture compresses out much of the redundancy encountered in the search space of a single CAM, particularly for flow-based lookups in a network. Destination and source address may be associated with internal equivalence classes independently in one level of the multi-level CAM architecture, while flow-specific properties linking arbitrary classes of the destination and source addresses may be applied in a later level of the multi-level CAM.
摘要:
A multi-level content addressable memory (CAM) architecture compresses out much of the redundancy encountered in the search space of a single CAM, particularly for flow-based lookups in a network. Destination and source address may be associated with internal equivalence classes independently in one level of the multi-level CAM architecture, while flow-specific properties linking arbitrary classes of the destination and source addresses may be applied in a later level of the multi-level CAM.