-
公开(公告)号:US10732932B2
公开(公告)日:2020-08-04
申请号:US16231170
申请日:2018-12-21
Applicant: Intel Corporation
Inventor: Bogdan Pasca , Martin Langhammer , Sergey Gribok , Gregg William Baeckler
IPC: G06F7/523 , H03K19/177
Abstract: Integrated circuits with digital signal processing (DSP) blocks are provided. A DSP block may include one or more large multiplier circuits. A large multiplier circuit such as an 18×18 multiplier circuit may be used to support two or more smaller multiplication operations such as two 8×8 integer multiplications or two 9×9 integer multiplications. To implement the two 8×8 or 9×9 unsigned/signed multiplications, the 18×18 multiplier may be configured to support two 8×8 multiplications with one shared operand, two 6×6 multiplications without any shared operand, or two 7×7 multiplications without any shared operand. Any potential overlap of partial product terms may be subtracted out using correction logic. The multiplication of the remaining most significant bits can be computed using associated multiplier extension logic and appended to the other least significant bits using merging logic.
-
公开(公告)号:US20190042198A1
公开(公告)日:2019-02-07
申请号:US16144999
申请日:2018-09-27
Applicant: Intel Corporation
Inventor: Martin Langhammer , Gregg William Baeckler , Sergey Gribok , Dmitry N. Denisenko , Bogdan Pasca
Abstract: Integrated circuits with digital signal processing (DSP) blocks are provided. A DSP block may include one or more large multiplier circuits. A large multiplier circuit (e.g., an 18×18 or 18×19 multiplier circuit) may be used to support two or more smaller multiplication operations sharing one or two sets of multiplier operands, a complex multiplication, and a sum of two multiplications. If the multiplier products overflow and interfere with one another, correction operations can be performed. Partial products from two or more larger multiplier circuits can be used to combine decomposed partial products. A large multiplier circuit can also be used to support two floating-point mantissa multipliers.
-
公开(公告)号:US10790829B2
公开(公告)日:2020-09-29
申请号:US16144558
申请日:2018-09-27
Applicant: Intel Corporation
Inventor: Martin Langhammer , Sergey Gribok , Gregg William Baeckler
IPC: H03K19/17736 , H03K19/173 , G06F7/523 , G06F7/501 , H03K19/17728
Abstract: Integrated circuits with programmable logic regions are provided. The programmable logic regions may be organized into smaller logic units sometimes referred to as a logic element. A logic element may include four lookup tables coupled to an adder carry chain. At least some of the lookup tables are configured to output combinatorial outputs, whereas the adder carry chain are used to output sum outputs. Both the combinatorial outputs and the sum outputs may be used simultaneously to support a multiplication operation, three or more logic operations, or arithmetic and combinatorial operations in parallel.
-
公开(公告)号:US20190288688A1
公开(公告)日:2019-09-19
申请号:US16434088
申请日:2019-06-06
Applicant: Intel Corporation
Inventor: Sergey Gribok , Gregg Baeckler , Martin Langhammer
IPC: H03K19/0175 , H03K19/177 , G06F7/50 , H03K19/20
Abstract: Integrated circuits with programmable logic regions are provided. The programmable logic regions may be organized into smaller logic units sometimes referred to as a logic cell. A logic cell may include four 4-input lookup tables (LUTs) coupled to an adder carry chain. Each of the four 4-input LUTs may include two 3-input LUTs and a selector multiplexer. The carry chain may include at three or more full adder circuits. The outputs of the 3-input LUTs may be directly connected to inputs of the full adder circuits in the carry chain. By providing at least the same or more number of full adder circuits as the total number of 4-input LUTs in the logic cell, the arithmetic density of the logic is enhanced.
-
公开(公告)号:US10102892B1
公开(公告)日:2018-10-16
申请号:US15611070
申请日:2017-06-01
Applicant: INTEL CORPORATION
Inventor: Sergey Gribok
Abstract: Unlike prior RAM-based shift register circuits, the presently-disclosed shift register circuit does not require control circuits to generate write and read address signals. Instead, the presently-disclosed shift register circuit utilizes a portion of RAM to store and provide the write and read address signals. The write and read addresses are output from the data output port of the RAM, and received by the write and read address ports of the RAM. Advantageously, the presently-disclosed shift register circuit requires less area to implement because the need for write and read control circuits is eliminated.
-
公开(公告)号:US11436399B2
公开(公告)日:2022-09-06
申请号:US16218179
申请日:2018-12-12
Applicant: Intel Corporation
Inventor: Martin Langhammer , Sergey Gribok , Gregg William Baeckler
IPC: G06F7/52 , G06F30/331 , H03K19/17704 , H03K19/17736 , H03K19/17756 , H03K19/17728
Abstract: A method for implementing a multiplier on a programmable logic device (PLD) is disclosed. Partial product bits of the multiplier are identified and how the partial product bits are to be summed to generate a final product from a multiplier and multiplicand are determined. Chains of PLD cells and cells in the chains of PLD cells for generating and summing the partial product bits are assigned. It is determined whether a bit in an assigned cell in an assigned chain of PLD cells is under-utilized. In response to determining that a bit is under-utilized, the assigning of the chains of PLD cells and cells for generating and summing the partial product bits are changed to improve an overall utilization of the chains of PLD cells and cells in the chains of PLD cells.
-
公开(公告)号:US11210063B2
公开(公告)日:2021-12-28
申请号:US16585857
申请日:2019-09-27
Applicant: Intel Corporation
Inventor: Martin Langhammer , Bogdan Pasca , Sergey Gribok , Gregg William Baeckler , Andrei Hagiescu
Abstract: A programmable device may be configured to support machine learning training operations using matrix multiplication circuitry implemented on a systolic array. The systolic array includes an array of processing elements, each of which includes hybrid floating-point dot-product circuitry. The hybrid dot-product circuitry has a hard data path that uses digital signal processing (DSP) blocks operating in floating-point mode and a hard/soft data path that uses DSP blocks operating in fixed-point mode operated in conjunction with general purpose soft logic. The hard/soft data path includes 2-element dot-product circuits that feed an adder tree. Results from the hard data path are combined with the adder tree using format conversion and normalization circuitry. Inputs to the hybrid dot-product circuitry may be in the BFLOAT16 format. The hard data path may be in the single precision format. The hard/soft data path uses a custom format that is similar to but different than BFLOAT16.
-
公开(公告)号:US11080019B2
公开(公告)日:2021-08-03
申请号:US16022857
申请日:2018-06-29
Applicant: Intel Corporation
Inventor: Martin Langhammer , Gregg William Baeckler , Sergey Gribok
IPC: G06F7/53 , G06F30/34 , G06F30/327 , G06F30/392 , G06F30/394 , G06F7/544 , G06N20/00 , G06F111/04 , G06F111/20 , G06F119/12
Abstract: A method for designing and configuring a system on a field programmable gate array (FPGA) is disclosed. A portion of the system that is implemented greater than a predetermined number of times is identified. A structural netlist that describes how to implement the portion of the system a plurality of times on the FPGA and that leverages a repetitive nature of implementing the portion is generated. The identifying and generating is performed prior to synthesizing and placing other portions of the system that are not implemented greater than the predetermined number of time. Synthesizing, placing, and routing the other portions of the system on the FPGA is performed in accordance with the structural netlist. The FPGA is configured with a configuration file that includes a design for the system that reflects the synthesizing, placing, and routing, wherein the configuring physically transforms resources on the FPGA to implement the system.
-
公开(公告)号:US20230026331A1
公开(公告)日:2023-01-26
申请号:US17952085
申请日:2022-09-23
Applicant: Intel Corporation
Inventor: Sergey Gribok , Bogdan Pasca , Martin Langhammer
Abstract: A circuit system for performing modular reduction of a modular multiplication includes multiplier circuits that receive a first subset of coefficients that are generated by summing partial products of a multiplication operation that is part of the modular multiplication. The multiplier circuits multiply the coefficients in the first subset by constants that equal remainders of divisions to generate products. Adder circuits add a second subset of the coefficients and segments of bits of the products that are aligned with respective ones of the second subset of the coefficients to generate sums.
-
公开(公告)号:US20220107783A1
公开(公告)日:2022-04-07
申请号:US17552436
申请日:2021-12-16
Applicant: Intel Corporation
Inventor: Martin Langhammer , Bogdan Pasca , Sergey Gribok , Gregg William Baeckler , Andrei Hagiescu
Abstract: A programmable device may be configured to support machine learning training operations using matrix multiplication circuitry. In some embodiments, the multiplication is implemented on a systolic array. The systolic array includes an array of processing elements, each of which includes hybrid floating-point dot-product circuitry.
-
-
-
-
-
-
-
-
-