Patent search ap:("Advanced Micro Devices Page Inc." OR "ATI Technologies ULC") AND inv:"Bin He"

11.

发明授权
Pairing SIMD lanes to perform double precision operations 有权

公开(公告)号：US11409536B2

公开(公告)日：2022-08-09

申请号：US15342809

申请日：2016-11-03

Applicant: Advanced Micro Devices, Inc.

Inventor： Bin He , YunXiao Zou , Jiasheng Chen , Michael Mantor

IPC: G06F9/38 , G06F9/30

Abstract: A method and apparatus for performing a multi-precision computation in a plurality of arithmetic logic units (ALUs) includes pairing a first Single Instruction/Multiple Data (SIMD) block channel device with a second SIMD block channel device to create a first block pair having one-level staggering between the first and second channel devices. A third SIMD block channel device is paired with a fourth SIMD block channel device to create a second block pair having one-level staggering between the third and fourth channel devices. A plurality of source inputs are received at the first block pair and the second block pair. The first block pair computes a first result, and the second block pair computes a second result.

12.

发明申请
STREAM PROCESSOR WITH OVERLAPPING EXECUTION 审中-公开

公开(公告)号：US20190004807A1

公开(公告)日：2019-01-03

申请号：US15657478

申请日：2017-07-24

Applicant: Advanced Micro Devices, Inc.

Inventor： Jiasheng Chen , Qingcheng Wang , Yunxiao Zou , Bin He , Jian Yang , Michael J. Mantor , Brian D. Emberling

IPC: G06F9/38

Abstract: Systems, apparatuses, and methods for implementing a stream processor with overlapping execution are disclosed. In one embodiment, a system includes at least a parallel processing unit with a plurality of execution pipelines. The processing throughput of the parallel processing unit is increased by overlapping execution of multi-pass instructions with single pass instructions without increasing the instruction issue rate. A first plurality of operands of a first vector instruction are read from a shared vector register file in a single clock cycle and stored in temporary storage. The first plurality of operands are accessed and utilized to initiate multiple instructions on individual vector elements on a first execution pipeline in subsequent clock cycles. A second plurality of operands are read from the shared vector register file during the subsequent clock cycles to initiate execution of one or more second vector instructions on the second execution pipeline.

13.

发明申请
MULTI-FORMAT OPERAND CIRCUIT 有权

公开(公告)号：US20250130794A1

公开(公告)日：2025-04-24

申请号：US18399659

申请日：2023-12-28

Applicant: Advanced Micro Devices, Inc.

Inventor： Shubh Shah , Ashutosh Garg , Bin He , Michael Mantor , Shubra Marwaha , Subramaniam Maiyuran

IPC: G06F9/30

Abstract: The disclosed processing circuit can perform an operation with a first operand having a first number format and a second operand having a second number format by directly using the first operand in the first number format and the second operand in the second number format to produce an output result. Various other methods, systems, and computer-readable media are also disclosed.

14.

发明申请
STOCHASTIC ROUNDING CIRCUIT 有权

公开(公告)号：US20250130769A1

公开(公告)日：2025-04-24

申请号：US18395039

申请日：2023-12-22

Applicant: Advanced Micro Devices, Inc.

Inventor： Shubh Shah , Ashutosh Garg , Bin He , Michael Mantor , Shubra Marwaha , Subramaniam Maiyuran

IPC: G06F7/499 , G06F7/02

Abstract: The disclosed circuit is configured to round a value in a first number format using a random value. Using the rounded value, the circuit can convert the rounded value to a second number format that has a lower precision than a precision of the first number format. Various other methods, systems, and computer-readable media are also disclosed.

15.

发明授权
Processing unit with small footprint arithmetic logic unit 有权

公开(公告)号：US12217021B2

公开(公告)日：2025-02-04

申请号：US18219268

申请日：2023-07-07

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Bin He , Shubh Shah , Michael Mantor

IPC: G06F7/57 , G06F17/16 , G06N3/08

Abstract: A parallel processing unit employs an arithmetic logic unit (ALU) having a relatively small footprint, thereby reducing the overall power consumption and circuit area of the processing unit. To support the smaller footprint, the ALU includes multiple stages to execute operations corresponding to a received instruction. The ALU executes at least one operation at a precision indicated by the received instruction, and then reduces the resulting data of the at least one operation to a smaller size before providing the results to another stage of the ALU to continue execution of the instruction.

16.

发明授权
Packed 16 bits instruction pipeline 有权

公开(公告)号：US11880683B2

公开(公告)日：2024-01-23

申请号：US15799560

申请日：2017-10-31

Applicant: Advanced Micro Devices, Inc.

Inventor： Jiasheng Chen , Bin He , Yunxiao Zou , Michael J. Mantor , Radhakrishna Giduthuri , Eric J. Finger , Brian D. Emberling

IPC: G06F9/30 , G06F7/483 , G06F7/57

CPC classification number: G06F9/30014 , G06F7/483 , G06F7/57 , G06F9/30036 , G06F9/30112 , G06F2207/3812 , G06F2207/3828

Abstract: Systems, apparatuses, and methods for efficiently processing arithmetic operations are disclosed. A computing system includes a processor capable of executing single precision mathematical instructions on data sizes of M bits and half precision mathematical instructions on data sizes of N bits, which is less than M bits. At least two source operands with M bits indicated by a received instruction are read from a register file. If the instruction is a packed math instruction, at least a first source operand with a size of N bits less than M bits is selected from either a high portion or a low portion of one of the at least two source operands read from the register file. The instruction includes fields storing bits, each bit indicating the high portion or the low portion of a given source operand associated with a register identifier specified elsewhere in the instruction.

17.

发明申请
DUAL VECTOR ARITHMETIC LOGIC UNIT 有权

公开(公告)号：US20220188076A1

公开(公告)日：2022-06-16

申请号：US17121354

申请日：2020-12-14

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Bin He , Brian Emberling , Mark Leather , Michael Mantor

IPC: G06F7/57 , G06F17/16 , G06F9/38 , G06T1/20

Abstract: A processing system executes wavefronts at multiple arithmetic logic unit (ALU) pipelines of a single instruction multiple data (SIMD) unit in a single execution cycle. The ALU pipelines each include a number of ALUs that execute instructions on wavefront operands that are collected from vector general process register (VGPR) banks at a cache and output results of the instructions executed on the wavefronts at a buffer. By storing wavefronts supplied by the VGPR banks at the cache, a greater number of wavefronts can be made available to the SIMD unit without increasing the VGPR bandwidth, enabling multiple ALU pipelines to execute instructions during a single execution cycle.

18.

发明申请
STREAM PROCESSOR WITH HIGH BANDWIDTH AND LOW POWER VECTOR REGISTER FILE 审中-公开

公开(公告)号：US20180357064A1

公开(公告)日：2018-12-13

申请号：US15644045

申请日：2017-07-07

Applicant: Advanced Micro Devices, Inc.

Inventor： Jiasheng Chen , Bin He , Mark M. Leather , Michael J. Mantor , Yunxiao Zou

IPC: G06F9/38 , G06F9/30 , G06F12/0875 , G06F12/0891

CPC classification number: G06F9/3867 , G06F9/3001 , G06F9/30021 , G06F9/30036 , G06F9/3012 , G06F9/30141 , G06F9/3802 , G06F9/3826 , G06F9/383 , G06F9/3832 , G06F9/3857 , G06F12/0804 , G06F12/0855 , G06F12/0875 , G06F12/0891 , G06F12/121 , G06F2212/1008 , G06F2212/1024 , G06F2212/452

Abstract: Systems, apparatuses, and methods for implementing a high bandwidth, low power vector register file for use by a parallel processor are disclosed. In one embodiment, a system includes at least a parallel processing unit with a plurality of processing pipeline. The parallel processing unit includes a vector arithmetic logic unit and a high bandwidth, low power, vector register file. The vector register file includes multi-bank high density random-access memories (RAMs) to satisfy register bandwidth requirements. The parallel processing unit also includes an instruction request queue and an instruction operand buffer to provide enough local bandwidth for VALU instructions and vector I/O instructions. Also, the parallel processing unit is configured to leverage the RAM's output flops as a last level cache to reduce duplicate operand requests between multiple instructions. The parallel processing unit includes a vector destination cache to provide additional R/W bandwidth for the vector register file.

19.

发明授权
Dual vector arithmetic logic unit 有权

公开(公告)号：US12299413B2

公开(公告)日：2025-05-13

申请号：US18414164

申请日：2024-01-16

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Bin He , Brian Emberling , Mark Leather , Michael Mantor

IPC: G06F7/57 , G06F9/38 , G06F17/16 , G06T1/20 , G06F15/80

Abstract: A processing system executes wavefronts at multiple arithmetic logic unit (ALU) pipelines of a single instruction multiple data (SIMD) unit in a single execution cycle. The ALU pipelines each include a number of ALUs that execute instructions on wavefront operands that are collected from vector general process register (VGPR) banks at a cache and output results of the instructions executed on the wavefronts at a buffer. By storing wavefronts supplied by the VGPR banks at the cache, a greater number of wavefronts can be made available to the SIMD unit without increasing the VGPR bandwidth, enabling multiple ALU pipelines to execute instructions during a single execution cycle.

20.

发明申请
FLOATING-POINT CONVERSION CIRCUIT 有权

公开(公告)号：US20250130767A1

公开(公告)日：2025-04-24

申请号：US18921401

申请日：2024-10-21

Applicant: Advanced Micro Devices, Inc.

Inventor： Shubh Shah , Ashutosh Garg , Bin He , Michael Mantor , Shubra Marwaha , Subramaniam Maiyuran

IPC: G06F7/483

Abstract: The disclosed circuit can select micro-operations specifically for converting a value in a first number format to a second number format. The circuit can include micro-operations for various conversions between different number formats, including number formats of different floating-point precisions. Various other methods, systems, and computer-readable media are also disclosed.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification