Patent search ap:("Intel Corporation") AND inv:"Sergey Gribok" Page 1

1.

发明授权
Methods for using a multiplier circuit to support multiple sub-multiplications using bit correction and extension 有权

公开(公告)号：US10732932B2

公开(公告)日：2020-08-04

申请号：US16231170

申请日：2018-12-21

Applicant: Intel Corporation

Inventor： Bogdan Pasca , Martin Langhammer , Sergey Gribok , Gregg William Baeckler

IPC: G06F7/523 , H03K19/177

Abstract: Integrated circuits with digital signal processing (DSP) blocks are provided. A DSP block may include one or more large multiplier circuits. A large multiplier circuit such as an 18×18 multiplier circuit may be used to support two or more smaller multiplication operations such as two 8×8 integer multiplications or two 9×9 integer multiplications. To implement the two 8×8 or 9×9 unsigned/signed multiplications, the 18×18 multiplier may be configured to support two 8×8 multiplications with one shared operand, two 6×6 multiplications without any shared operand, or two 7×7 multiplications without any shared operand. Any potential overlap of partial product terms may be subtracted out using correction logic. The multiplication of the remaining most significant bits can be computed using associated multiplier extension logic and appended to the other least significant bits using merging logic.

2.

发明申请
METHODS FOR USING A MULTIPLIER TO SUPPORT MULTIPLE SUB-MULTIPLICATION OPERATIONS 审中-公开

公开(公告)号：US20190042198A1

公开(公告)日：2019-02-07

申请号：US16144999

申请日：2018-09-27

Applicant: Intel Corporation

Inventor： Martin Langhammer , Gregg William Baeckler , Sergey Gribok , Dmitry N. Denisenko , Bogdan Pasca

IPC: G06F7/544 , G06F7/483

Abstract: Integrated circuits with digital signal processing (DSP) blocks are provided. A DSP block may include one or more large multiplier circuits. A large multiplier circuit (e.g., an 18×18 or 18×19 multiplier circuit) may be used to support two or more smaller multiplication operations sharing one or two sets of multiplier operands, a complex multiplication, and a sum of two multiplications. If the multiplier products overflow and interfere with one another, correction operations can be performed. Partial products from two or more larger multiplier circuits can be used to combine decomposed partial products. A large multiplier circuit can also be used to support two floating-point mantissa multipliers.

3.

发明授权
Logic circuits with simultaneous dual function capability 有权

公开(公告)号：US10790829B2

公开(公告)日：2020-09-29

申请号：US16144558

申请日：2018-09-27

Applicant: Intel Corporation

Inventor： Martin Langhammer , Sergey Gribok , Gregg William Baeckler

IPC: H03K19/17736 , H03K19/173 , G06F7/523 , G06F7/501 , H03K19/17728

Abstract: Integrated circuits with programmable logic regions are provided. The programmable logic regions may be organized into smaller logic units sometimes referred to as a logic element. A logic element may include four lookup tables coupled to an adder carry chain. At least some of the lookup tables are configured to output combinatorial outputs, whereas the adder carry chain are used to output sum outputs. Both the combinatorial outputs and the sum outputs may be used simultaneously to support a multiplication operation, three or more logic operations, or arithmetic and combinatorial operations in parallel.

4.

发明申请
LOGIC CIRCUITS WITH AUGMENTED ARITHMETIC DENSITIES 审中-公开

公开(公告)号：US20190288688A1

公开(公告)日：2019-09-19

申请号：US16434088

申请日：2019-06-06

Applicant: Intel Corporation

Inventor： Sergey Gribok , Gregg Baeckler , Martin Langhammer

IPC: H03K19/0175 , H03K19/177 , G06F7/50 , H03K19/20

Abstract: Integrated circuits with programmable logic regions are provided. The programmable logic regions may be organized into smaller logic units sometimes referred to as a logic cell. A logic cell may include four 4-input lookup tables (LUTs) coupled to an adder carry chain. Each of the four 4-input LUTs may include two 3-input LUTs and a selector multiplexer. The carry chain may include at three or more full adder circuits. The outputs of the 3-input LUTs may be directly connected to inputs of the full adder circuits in the carry chain. By providing at least the same or more number of full adder circuits as the total number of 4-input LUTs in the logic cell, the arithmetic density of the logic is enhanced.

5.

发明授权
RAM-based shift register with embedded addressing 有权

公开(公告)号：US10102892B1

公开(公告)日：2018-10-16

申请号：US15611070

申请日：2017-06-01

Applicant: INTEL CORPORATION

Inventor： Sergey Gribok

IPC: G11C8/10 , G11C8/04 , G11C8/18 , G11C7/10

Abstract: Unlike prior RAM-based shift register circuits, the presently-disclosed shift register circuit does not require control circuits to generate write and read address signals. Instead, the presently-disclosed shift register circuit utilizes a portion of RAM to store and provide the write and read address signals. The write and read addresses are output from the data output port of the RAM, and received by the write and read address ports of the RAM. Advantageously, the presently-disclosed shift register circuit requires less area to implement because the need for write and read control circuits is eliminated.

6.

发明授权
Method and apparatus for performing multiplier regularization 有权

公开(公告)号：US11436399B2

公开(公告)日：2022-09-06

申请号：US16218179

申请日：2018-12-12

Applicant: Intel Corporation

Inventor： Martin Langhammer , Sergey Gribok , Gregg William Baeckler

IPC: G06F7/52 , G06F30/331 , H03K19/17704 , H03K19/17736 , H03K19/17756 , H03K19/17728

Abstract: A method for implementing a multiplier on a programmable logic device (PLD) is disclosed. Partial product bits of the multiplier are identified and how the partial product bits are to be summed to generate a final product from a multiplier and multiplicand are determined. Chains of PLD cells and cells in the chains of PLD cells for generating and summing the partial product bits are assigned. It is determined whether a bit in an assigned cell in an assigned chain of PLD cells is under-utilized. In response to determining that a bit is under-utilized, the assigning of the chains of PLD cells and cells for generating and summing the partial product bits are changed to improve an overall utilization of the chains of PLD cells and cells in the chains of PLD cells.

7.

发明授权
Machine learning training architecture for programmable devices 有权

公开(公告)号：US11210063B2

公开(公告)日：2021-12-28

申请号：US16585857

申请日：2019-09-27

Applicant: Intel Corporation

Inventor： Martin Langhammer , Bogdan Pasca , Sergey Gribok , Gregg William Baeckler , Andrei Hagiescu

IPC: G06F7/487 , G06F7/501 , H03M7/24 , G06F9/30 , G06F17/16

Abstract: A programmable device may be configured to support machine learning training operations using matrix multiplication circuitry implemented on a systolic array. The systolic array includes an array of processing elements, each of which includes hybrid floating-point dot-product circuitry. The hybrid dot-product circuitry has a hard data path that uses digital signal processing (DSP) blocks operating in floating-point mode and a hard/soft data path that uses DSP blocks operating in fixed-point mode operated in conjunction with general purpose soft logic. The hard/soft data path includes 2-element dot-product circuits that feed an adder tree. Results from the hard data path are combined with the adder tree using format conversion and normalization circuitry. Inputs to the hybrid dot-product circuitry may be in the BFLOAT16 format. The hard data path may be in the single precision format. The hard/soft data path uses a custom format that is similar to but different than BFLOAT16.

8.

发明授权
Method and apparatus for performing synthesis for field programmable gate array embedded feature placement 有权

公开(公告)号：US11080019B2

公开(公告)日：2021-08-03

申请号：US16022857

申请日：2018-06-29

Applicant: Intel Corporation

Inventor： Martin Langhammer , Gregg William Baeckler , Sergey Gribok

IPC: G06F7/53 , G06F30/34 , G06F30/327 , G06F30/392 , G06F30/394 , G06F7/544 , G06N20/00 , G06F111/04 , G06F111/20 , G06F119/12

Abstract: A method for designing and configuring a system on a field programmable gate array (FPGA) is disclosed. A portion of the system that is implemented greater than a predetermined number of times is identified. A structural netlist that describes how to implement the portion of the system a plurality of times on the FPGA and that leverages a repetitive nature of implementing the portion is generated. The identifying and generating is performed prior to synthesizing and placing other portions of the system that are not implemented greater than the predetermined number of time. Synthesizing, placing, and routing the other portions of the system on the FPGA is performed in accordance with the structural netlist. The FPGA is configured with a configuration file that includes a design for the system that reflects the synthesizing, placing, and routing, wherein the configuring physically transforms resources on the FPGA to implement the system.

9.

发明申请
High Performance Systems And Methods For Modular Multiplication 有权

公开(公告)号：US20230026331A1

公开(公告)日：2023-01-26

申请号：US17952085

申请日：2022-09-23

Applicant: Intel Corporation

Inventor： Sergey Gribok , Bogdan Pasca , Martin Langhammer

IPC: G06F7/72 , G06F7/544 , G06F7/523 , G06F7/50 , G06F1/03

Abstract: A circuit system for performing modular reduction of a modular multiplication includes multiplier circuits that receive a first subset of coefficients that are generated by summing partial products of a multiplication operation that is part of the modular multiplication. The multiplier circuits multiply the coefficients in the first subset by constants that equal remainders of divisions to generate products. Adder circuits add a second subset of the coefficients and segments of bits of the products that are aligned with respective ones of the second subset of the coefficients to generate sums.

10.

发明申请
MACHINE LEARNING TRAINING ARCHITECTURE FOR PROGRAMMABLE DEVICES 有权

公开(公告)号：US20220107783A1

公开(公告)日：2022-04-07

申请号：US17552436

申请日：2021-12-16

Applicant: Intel Corporation

Inventor： Martin Langhammer , Bogdan Pasca , Sergey Gribok , Gregg William Baeckler , Andrei Hagiescu

IPC: G06F7/487 , G06F7/501 , H03M7/24 , G06F17/16 , G06F9/30

Abstract: A programmable device may be configured to support machine learning training operations using matrix multiplication circuitry. In some embodiments, the multiplication is implemented on a systolic array. The systolic array includes an array of processing elements, each of which includes hybrid floating-point dot-product circuitry.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification