Patent search ap:("Advanced Micro Devices Page Inc." OR "ATI Technologies ULC") AND inv:"Bin He"

21.

发明授权
Processing unit with mixed precision operations 有权

公开(公告)号：US11768664B2

公开(公告)日：2023-09-26

申请号：US16591031

申请日：2019-10-02

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Bin He , Michael Mantor , Jiasheng Chen

IPC: G06F7/57 , G06F7/544 , G06F9/38 , G06F7/483

CPC classification number: G06F7/57 , G06F7/483 , G06F7/5443 , G06F9/3818 , G06F2207/3824

Abstract: A graphics processing unit (GPU) implements operations, with associated op codes, to perform mixed precision mathematical operations. The GPU includes an arithmetic logic unit (ALU) with different execution paths, wherein each execution path executes a different mixed precision operation. By implementing mixed precision operations at the ALU in response to designate op codes that delineate the operations, the GPU efficiently increases the precision of specified mathematical operations while reducing execution overhead.

22.

发明授权
Matrix multiplication unit with flexible precision operations 有权

公开(公告)号：US11762658B2

公开(公告)日：2023-09-19

申请号：US16581252

申请日：2019-09-24

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Bin He , Michael Mantor , Jiasheng Chen , Jian Huang

IPC: G06F9/30 , G06F17/16 , G06F9/38 , G06F9/54

CPC classification number: G06F9/30036 , G06F9/30101 , G06F9/3877 , G06F9/544 , G06F17/16

Abstract: A processing unit such as a graphics processing unit (GPU) includes a plurality of vector signal processors (VSPs) that include multiply/accumulate elements. The processing unit also includes a plurality of registers associated with the plurality of VSPs. First portions of first and second matrices are fetched into the plurality of registers prior to a first round that includes a plurality of iterations. The multiply/accumulate elements perform matrix multiplication and accumulation on different combinations of subsets of the first portions of the first and second matrices in the plurality of iterations prior to fetching second portions of the first and second matrices into the plurality of registers for a second round. The accumulated results of multiplying the first portions of the first and second matrices are written into an output buffer in response to completing the plurality of iterations.

23.

发明授权
Dedicated vector sub-processor system 有权

公开(公告)号：US11630667B2

公开(公告)日：2023-04-18

申请号：US16697660

申请日：2019-11-27

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Jiasheng Chen , Bin He , Jian Huang , Michael Mantor

IPC: G06F9/30 , G06F9/48

Abstract: A processor includes a plurality of vector sub-processors (VSPs) and a plurality of memory banks dedicated to respective VSPs. A first memory bank corresponding to a first VSP includes a first plurality of high vector general purpose register (VGPR) banks and a first plurality of low VGPR banks corresponding to the first plurality of high VGPR banks. The first memory bank further includes a plurality of operand gathering components that store operands from respective high VGPR banks and low VGPR banks. The operand gathering components are assigned to individual threads while the threads are executed by the first VSP.

24.

发明授权
Pipeline including separate hardware data paths for different instruction types 有权

公开(公告)号：US11494192B2

公开(公告)日：2022-11-08

申请号：US16860842

申请日：2020-04-28

Applicant: ADVANCED MICRO DEVICES, INC. , ADVANCED MICRO DEVICES (SHANGHAI) CO., LTD.

Inventor： Jiasheng Chen , YunXiao Zou , Bin He , Angel E. Socarras , QingCheng Wang , Wei Yuan , Michael Mantor

IPC: G06F9/38 , G06F9/30 , G06F15/80 , G06F15/76

Abstract: A processing element is implemented in a stage of a pipeline and configured to execute an instruction. A first array of multiplexers is to provide information associated with the instruction to the processing element in response to the instruction being in a first set of instructions. A second array of multiplexers is to provide information associated with the instruction to the first processing element in response to the instruction being in a second set of instructions. A control unit is to gate at least one of power or a clock signal provided to the first array of multiplexers in response to the instruction being in the second set.

25.

发明授权
Stream processor with decoupled crossbar for cross lane operations 有权

公开(公告)号：US10970081B2

公开(公告)日：2021-04-06

申请号：US15637629

申请日：2017-06-29

Applicant: Advanced Micro Devices, Inc.

Inventor： Jiasheng Chen , Bin He , Mohammad Reza Hakami , Timothy Lottes , Justin David Smith , Michael J. Mantor , Derek Carson

IPC: G06F9/30 , G06F9/38 , G06F9/52

Abstract: Systems, apparatuses, and methods for implementing a decoupled crossbar for a stream processor are disclosed. In one embodiment, a system includes at least a multi-lane execution pipeline, a vector register file, and a crossbar. The system is configured to determine if a given instruction in an instruction stream requires a permutation on data operands retrieved from the vector register file. The system conveys the data operands to the multi-lane execution pipeline on a first path which includes the crossbar responsive to determining the given instruction requires a permutation on the data operands. The crossbar then performs the necessary permutation to route the data operands to the proper processing lanes. Otherwise, the system conveys the data operands to the multi-lane execution pipeline on a second path which bypasses the crossbar responsive to determining the given instruction does not require a permutation on the input operands.

26.

发明授权
Processor support for bypassing vector source operands 有权

公开(公告)号：US10817302B2

公开(公告)日：2020-10-27

申请号：US15644045

申请日：2017-07-07

Applicant: Advanced Micro Devices, Inc.

Inventor： Jiasheng Chen , Bin He , Mark M. Leather , Michael J. Mantor , Yunxiao Zou

IPC: G06F9/38 , G06F9/30 , G06F12/0891 , G06F12/0855 , G06F12/0804 , G06F12/121 , G06F12/0875

Abstract: Systems, apparatuses, and methods for implementing a high bandwidth, low power vector register file for use by a parallel processor are disclosed. In one embodiment, a system includes at least a parallel processing unit with a plurality of processing pipeline. The parallel processing unit includes a vector arithmetic logic unit and a high bandwidth, low power, vector register file. The vector register file includes multi-bank high density random-access memories (RAMs) to satisfy register bandwidth requirements. The parallel processing unit also includes an instruction request queue and an instruction operand buffer to provide enough local bandwidth for VALU instructions and vector I/O instructions. Also, the parallel processing unit is configured to leverage the RAM's output flops as a last level cache to reduce duplicate operand requests between multiple instructions. The parallel processing unit includes a vector destination cache to provide additional R/W bandwidth for the vector register file.

27.

发明授权
Pipeline including separate hardware data paths for different instruction types 有权

公开(公告)号：US10656951B2

公开(公告)日：2020-05-19

申请号：US15789318

申请日：2017-10-20

Applicant: Advanced Micro Devices, Inc. , Advanced Micro Devices (Shanghai) Co., Ltd.

Inventor： Jiasheng Chen , YunXiao Zou , Bin He , Angel E. Socarras , QingCheng Wang , Wei Yuan , Michael Mantor

IPC: G06F9/30 , G06F9/38 , G06F15/80 , G06F15/76

Abstract: A processing element is implemented in a stage of a pipeline and configured to execute an instruction. A first array of multiplexers is to provide information associated with the instruction to the processing element in response to the instruction being in a first set of instructions. A second array of multiplexers is to provide information associated with the instruction to the first processing element in response to the instruction being in a second set of instructions. A control unit is to gate at least one of power or a clock signal provided to the first array of multiplexers in response to the instruction being in the second set.

28.

发明申请
STREAM PROCESSOR WITH DECOUPLED CROSSBAR FOR CROSS LANE OPERATIONS 审中-公开

公开(公告)号：US20190004814A1

公开(公告)日：2019-01-03

申请号：US15637629

申请日：2017-06-29

Applicant: Advanced Micro Devices, Inc.

Inventor： Jiasheng Chen , Bin He , Mohammad Reza Hakami , Timothy Lottes , Justin David Smith , Michael J. Mantor , Derek Carson

IPC: G06F9/38 , G06F9/52 , G06F9/30

Abstract: Systems, apparatuses, and methods for implementing a decoupled crossbar for a stream processor are disclosed. In one embodiment, a system includes at least a multi-lane execution pipeline, a vector register file, and a crossbar. The system is configured to determine if a given instruction in an instruction stream requires a permutation on data operands retrieved from the vector register file. The system conveys the data operands to the multi-lane execution pipeline on a first path which includes the crossbar responsive to determining the given instruction requires a permutation on the data operands. The crossbar then performs the necessary permutation to route the data operands to the proper processing lanes. Otherwise, the system conveys the data operands to the multi-lane execution pipeline on a second path which bypasses the crossbar responsive to determining the given instruction does not require a permutation on the input operands.

29.

发明申请
METHOD AND SYSTEM FOR PERFORMING LOW POWER AND LOW LATENCY MULTI-PRECISION COMPUTATION 审中-公开

公开(公告)号：US20180113709A1

公开(公告)日：2018-04-26

申请号：US15342809

申请日：2016-11-03

Applicant: Advanced Micro Devices, Inc.

Inventor： Bin He , YunXiao Zou , Jiasheng Chen , Michael Mantor

IPC: G06F9/30 , G06F9/38

CPC classification number: G06F9/3887 , G06F9/30014 , G06F9/30036 , G06F9/3893

Abstract: A method and apparatus for performing a multi-precision computation in a plurality of arithmetic logic units (ALUs) includes pairing a first Single Instruction/Multiple Data (SIMD) block channel device with a second SIMD block channel device to create a first block pair having one-level staggering between the first and second channel devices. A third SIMD block channel device is paired with a fourth SIMD block channel device to create a second block pair having one-level staggering between the third and fourth channel devices. A plurality of source inputs are received at the first block pair and the second block pair. The first block pair computes a first result, and the second block pair computes a second result.

30.

发明申请
FLOATING POINT BIAS SWITCHING 有权

公开(公告)号：US20250130774A1

公开(公告)日：2025-04-24

申请号：US18395190

申请日：2023-12-22

Applicant: Advanced Micro Devices, Inc.

Inventor： Shubh Shah , Ashutosh Garg , Bin He , Michael Mantor , Shubra Marwaha , Subramaniam Maiyuran

IPC: G06F7/556 , G06F7/483

Abstract: The disclosed circuit can interpret a bit sequence as a value based on one of multiple floating point number formats in a bias mode indicated by a bias mode indicator. The circuit can and perform an operation using the value in the bias mode. Various other methods, systems, and computer-readable media are also disclosed.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification