Patent search ap:("ADVANCED MICRO DEVICES Page INC.") AND inv:"Bin He"

1.

发明授权
Block data load with transpose into memory 有权

公开(公告)号：US12229570B2

公开(公告)日：2025-02-18

申请号：US17952270

申请日：2022-09-25

Applicant: Advanced Micro Devices, Inc.

Inventor： Bin He , Michael John Mantor , Brian Emberling , Liang Huang , Chao Liu

IPC: G06F9/30 , G06F1/20 , G06F9/38 , G06F15/80

Abstract: Block data load with transpose techniques are described. In one example, an input is received, at a control unit, specifying an instruction to load a block of data to at least one memory module using a transpose operation. Responsive to the receiving the input by the control unit, the block of data is caused to be loaded to the at least one memory module by transposing the block of data to form a transposed block of data and storing the transposed block of data in the at least one memory.

2.

发明公开
WAVE LEVEL MATRIX MULTIPLY INSTRUCTIONS 审中-公开

公开(公告)号：US20240329998A1

公开(公告)日：2024-10-03

申请号：US18619392

申请日：2024-03-28

Applicant: Advanced Micro Devices, Inc.

Inventor： Bin He , Michael J. Mantor , Brian D. Emberling

IPC: G06F9/38 , G06F9/30

CPC classification number: G06F9/3802 , G06F9/3001 , G06F9/30098 , G06F9/3867

Abstract: An apparatus and method for efficiently processing multiplication and accumulate operations for matrices in applications. In various implementations, a computing system includes a parallel data processing circuit and a memory. The memory stores the instructions (or translated commands) of a parallel data application. The circuitry of the parallel data processing circuit performs a matrix multiplication operation using source operands accessed only once from a vector register file and multiple instantiations of a vector processing circuit capable of performing multiple matrix multiplication operations corresponding to multiple different types of instructions. The multiplier circuit and the adder circuit of the vector processing circuit perform each of the fused multiply add (FMA) operation and the dot product (inner product) operation without independent, dedicated execution pipelines with one execution pipeline for the FMA operation and the other separate execution pipeline for the dot product operation.

3.

发明授权
Processing unit with small footprint arithmetic logic unit 有权

公开(公告)号：US11720328B2

公开(公告)日：2023-08-08

申请号：US17029836

申请日：2020-09-23

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Bin He , Shubh Shah , Michael Mantor

IPC: G06F7/57 , G06N3/08 , G06F17/16

CPC classification number: G06F7/57 , G06F17/16 , G06N3/08

Abstract: A parallel processing unit employs an arithmetic logic unit (ALU) having a relatively small footprint, thereby reducing the overall power consumption and circuit area of the processing unit. To support the smaller footprint, the ALU includes multiple stages to execute operations corresponding to a received instruction. The ALU executes at least one operation at a precision indicated by the received instruction, and then reduces the resulting data of the at least one operation to a smaller size before providing the results to another stage of the ALU to continue execution of the instruction.

4.

发明授权
Dual vector arithmetic logic unit 有权

公开(公告)号：US11675568B2

公开(公告)日：2023-06-13

申请号：US17121354

申请日：2020-12-14

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Bin He , Brian Emberling , Mark Leather , Michael Mantor

IPC: G06F7/57 , G06F17/16 , G06T1/20 , G06F9/38 , G06F15/80

CPC classification number: G06F7/57 , G06F9/3867 , G06F17/16 , G06T1/20 , G06F15/8015

Abstract: A processing system executes wavefronts at multiple arithmetic logic unit (ALU) pipelines of a single instruction multiple data (SIMD) unit in a single execution cycle. The ALU pipelines each include a number of ALUs that execute instructions on wavefront operands that are collected from vector general process register (VGPR) banks at a cache and output results of the instructions executed on the wavefronts at a buffer. By storing wavefronts supplied by the VGPR banks at the cache, a greater number of wavefronts can be made available to the SIMD unit without increasing the VGPR bandwidth, enabling multiple ALU pipelines to execute instructions during a single execution cycle.

5.

发明授权
Arithemetic logic unit register sequencing 有权

公开(公告)号：US11237827B2

公开(公告)日：2022-02-01

申请号：US16696108

申请日：2019-11-26

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Bin He , Jiasheng Chen , Jian Huang

IPC: G06F12/02 , G06F9/30 , G06F7/57 , G06F9/48

Abstract: A graphics processing unit (GPU) sequences provision of operands to a set of operand registers, thereby allowing the GPU to share at least one of the operand registers between processing. The GPU includes a plurality of arithmetic logic units (ALUs) with at least one of the ALUs configured to perform double precision operations. The GPU further includes a set of operand registers configured to store single precision operands. For a plurality of executing threads that request double precision operations, the GPU stores the corresponding operands at the operand registers. Over a plurality of execution cycles, the GPU sequences transfer of operands from the set of operand registers to a designated double precision operand register. During each execution cycle, the double-precision ALU executes a double precision operation using the operand stored at the double precision operand register.

6.

发明授权
Method and processing apparatus for gating redundant threads 有权

公开(公告)号：US10360177B2

公开(公告)日：2019-07-23

申请号：US15189054

申请日：2016-06-22

Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC

Inventor： Syed Zohaib M. Gilani , Jiasheng Chen , QingCheng Wang , YunXiao Zou , Michael Mantor , Bin He , Timour T. Paltashev

IPC: G06F15/80 , G06F1/3234 , G06T15/00

Abstract: Described is a method and processing apparatus to improve power efficiency by gating redundant threads processing. In particular, the method for gating redundant threads in a graphics processor includes determining if data for a thread and data for at least another thread are within a predetermined similarity threshold, gating execution of the at least another thread if the data for the thread and the data for the at least another thread are within the predetermined similarity threshold, and using an output data from the thread as an output data for the at least another thread.

7.

发明申请
PACKED 16 BITS INSTRUCTION PIPELINE 审中-公开

公开(公告)号：US20190129718A1

公开(公告)日：2019-05-02

申请号：US15799560

申请日：2017-10-31

Applicant: Advanced Micro Devices, Inc.

Inventor： Jiasheng Chen , Bin He , Yunxiao Zou , Michael J. Mantor , Radhakrishna Giduthuri , Eric J. Finger , Brian D. Emberling

IPC: G06F9/30 , G06F7/483

Abstract: Systems, apparatuses, and methods for routing traffic between clients and system memory are disclosed. A computing system includes a processor capable of executing single precision mathematical instructions on data sizes of M bits and half precision mathematical instructions on data sizes of N bits, which is less than M bits. At least two source operands with M bits indicated by a received instruction are read from a register file. If the instruction is a packed math instruction, at least a first source operand with a size of N bits less than M bits is selected from either a high portion or a low portion of one of the at least two source operands read from the register file. The instruction includes fields storing bits, each bit indicating the high portion or the low portion of a given source operand associated with a register identifier specified elsewhere in the instruction.

8.

发明申请
SUPER SINGLE INSTRUCTION MULTIPLE DATA (SUPER-SIMD) FOR GRAPHICS PROCESSING UNIT (GPU) COMPUTING 审中-公开

公开(公告)号：US20180121386A1

公开(公告)日：2018-05-03

申请号：US15354560

申请日：2016-11-17

Applicant: Advanced Micro Devices, Inc.

Inventor： Jiasheng Chen , Angel E. Socarras , Michael Mantor , YunXiao Zou , Bin He

IPC: G06F15/80 , G06F9/30 , G06F12/0875 , G06F12/0891

CPC classification number: G06F15/8007 , G06F9/3001 , G06F9/30105 , G06F9/3012 , G06F9/30123 , G06F9/3828 , G06F9/3851 , G06F9/3887 , G06F9/3891 , G06F12/0875 , G06F12/0891 , G06F2212/604

Abstract: A super single instruction, multiple data (SIMD) computing structure and a method of executing instructions in the super-SIMD is disclosed. The super-SIMD structure is capable of executing more than one instruction from a single or multiple thread and includes a plurality of vector general purpose registers (VGPRs), a first arithmetic logic unit (ALU), the first ALU coupled to the plurality of VGPRs, a second ALU, the second ALU coupled to the plurality of VGPRs, and a destination cache (Do$) that is coupled via bypass and forwarding logic to the first ALU, the second ALU and receiving an output of the first ALU and the second ALU. The Do$ holds multiple instructions results to extend an operand by-pass network to save read and write transactions power. A compute unit (CU) and a small CU including a plurality of super-SIMDs are also disclosed.

9.

发明申请
CONVOLUTIONAL NEURAL NETWORK OPERATIONS 有权

公开(公告)号：US20230097279A1

公开(公告)日：2023-03-30

申请号：US17489734

申请日：2021-09-29

Applicant: Advanced Micro Devices, Inc.

Inventor： Brian Emberling , Michael Mantor , Michael Y. Chow , Bin He

IPC: G06N3/04 , G06F9/38 , G06F17/16

Abstract: Methods and systems are disclosed for executing operations on single-instruction-multiple-data (SIMD) units. Techniques disclosed perform a dot product operation on input data during one computer cycle, including convolving the input data, generating intermediate data, and applying one or more transitional operations to the intermediate data to generate output data. Aspects described, wherein the input data is an input to a layer of a convolutional neural network and the generated output data is the output of the layer.

10.

发明授权
Pairing SIMD lanes to perform double precision operations 有权

公开(公告)号：US11409536B2

公开(公告)日：2022-08-09

申请号：US15342809

申请日：2016-11-03

Applicant: Advanced Micro Devices, Inc.

Inventor： Bin He , YunXiao Zou , Jiasheng Chen , Michael Mantor

IPC: G06F9/38 , G06F9/30

Abstract: A method and apparatus for performing a multi-precision computation in a plurality of arithmetic logic units (ALUs) includes pairing a first Single Instruction/Multiple Data (SIMD) block channel device with a second SIMD block channel device to create a first block pair having one-level staggering between the first and second channel devices. A third SIMD block channel device is paired with a fourth SIMD block channel device to create a second block pair having one-level staggering between the third and fourth channel devices. A plurality of source inputs are received at the first block pair and the second block pair. The first block pair computes a first result, and the second block pair computes a second result.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification