-
公开(公告)号:US11409536B2
公开(公告)日:2022-08-09
申请号:US15342809
申请日:2016-11-03
Applicant: Advanced Micro Devices, Inc.
Inventor: Bin He , YunXiao Zou , Jiasheng Chen , Michael Mantor
Abstract: A method and apparatus for performing a multi-precision computation in a plurality of arithmetic logic units (ALUs) includes pairing a first Single Instruction/Multiple Data (SIMD) block channel device with a second SIMD block channel device to create a first block pair having one-level staggering between the first and second channel devices. A third SIMD block channel device is paired with a fourth SIMD block channel device to create a second block pair having one-level staggering between the third and fourth channel devices. A plurality of source inputs are received at the first block pair and the second block pair. The first block pair computes a first result, and the second block pair computes a second result.
-
公开(公告)号:US20190004807A1
公开(公告)日:2019-01-03
申请号:US15657478
申请日:2017-07-24
Applicant: Advanced Micro Devices, Inc.
Inventor: Jiasheng Chen , Qingcheng Wang , Yunxiao Zou , Bin He , Jian Yang , Michael J. Mantor , Brian D. Emberling
IPC: G06F9/38
Abstract: Systems, apparatuses, and methods for implementing a stream processor with overlapping execution are disclosed. In one embodiment, a system includes at least a parallel processing unit with a plurality of execution pipelines. The processing throughput of the parallel processing unit is increased by overlapping execution of multi-pass instructions with single pass instructions without increasing the instruction issue rate. A first plurality of operands of a first vector instruction are read from a shared vector register file in a single clock cycle and stored in temporary storage. The first plurality of operands are accessed and utilized to initiate multiple instructions on individual vector elements on a first execution pipeline in subsequent clock cycles. A second plurality of operands are read from the shared vector register file during the subsequent clock cycles to initiate execution of one or more second vector instructions on the second execution pipeline.
-
公开(公告)号:US20250130794A1
公开(公告)日:2025-04-24
申请号:US18399659
申请日:2023-12-28
Applicant: Advanced Micro Devices, Inc.
Inventor: Shubh Shah , Ashutosh Garg , Bin He , Michael Mantor , Shubra Marwaha , Subramaniam Maiyuran
IPC: G06F9/30
Abstract: The disclosed processing circuit can perform an operation with a first operand having a first number format and a second operand having a second number format by directly using the first operand in the first number format and the second operand in the second number format to produce an output result. Various other methods, systems, and computer-readable media are also disclosed.
-
公开(公告)号:US20250130769A1
公开(公告)日:2025-04-24
申请号:US18395039
申请日:2023-12-22
Applicant: Advanced Micro Devices, Inc.
Inventor: Shubh Shah , Ashutosh Garg , Bin He , Michael Mantor , Shubra Marwaha , Subramaniam Maiyuran
Abstract: The disclosed circuit is configured to round a value in a first number format using a random value. Using the rounded value, the circuit can convert the rounded value to a second number format that has a lower precision than a precision of the first number format. Various other methods, systems, and computer-readable media are also disclosed.
-
公开(公告)号:US12217021B2
公开(公告)日:2025-02-04
申请号:US18219268
申请日:2023-07-07
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Bin He , Shubh Shah , Michael Mantor
Abstract: A parallel processing unit employs an arithmetic logic unit (ALU) having a relatively small footprint, thereby reducing the overall power consumption and circuit area of the processing unit. To support the smaller footprint, the ALU includes multiple stages to execute operations corresponding to a received instruction. The ALU executes at least one operation at a precision indicated by the received instruction, and then reduces the resulting data of the at least one operation to a smaller size before providing the results to another stage of the ALU to continue execution of the instruction.
-
公开(公告)号:US11880683B2
公开(公告)日:2024-01-23
申请号:US15799560
申请日:2017-10-31
Applicant: Advanced Micro Devices, Inc.
Inventor: Jiasheng Chen , Bin He , Yunxiao Zou , Michael J. Mantor , Radhakrishna Giduthuri , Eric J. Finger , Brian D. Emberling
CPC classification number: G06F9/30014 , G06F7/483 , G06F7/57 , G06F9/30036 , G06F9/30112 , G06F2207/3812 , G06F2207/3828
Abstract: Systems, apparatuses, and methods for efficiently processing arithmetic operations are disclosed. A computing system includes a processor capable of executing single precision mathematical instructions on data sizes of M bits and half precision mathematical instructions on data sizes of N bits, which is less than M bits. At least two source operands with M bits indicated by a received instruction are read from a register file. If the instruction is a packed math instruction, at least a first source operand with a size of N bits less than M bits is selected from either a high portion or a low portion of one of the at least two source operands read from the register file. The instruction includes fields storing bits, each bit indicating the high portion or the low portion of a given source operand associated with a register identifier specified elsewhere in the instruction.
-
公开(公告)号:US20220188076A1
公开(公告)日:2022-06-16
申请号:US17121354
申请日:2020-12-14
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Bin He , Brian Emberling , Mark Leather , Michael Mantor
Abstract: A processing system executes wavefronts at multiple arithmetic logic unit (ALU) pipelines of a single instruction multiple data (SIMD) unit in a single execution cycle. The ALU pipelines each include a number of ALUs that execute instructions on wavefront operands that are collected from vector general process register (VGPR) banks at a cache and output results of the instructions executed on the wavefronts at a buffer. By storing wavefronts supplied by the VGPR banks at the cache, a greater number of wavefronts can be made available to the SIMD unit without increasing the VGPR bandwidth, enabling multiple ALU pipelines to execute instructions during a single execution cycle.
-
公开(公告)号:US20180357064A1
公开(公告)日:2018-12-13
申请号:US15644045
申请日:2017-07-07
Applicant: Advanced Micro Devices, Inc.
Inventor: Jiasheng Chen , Bin He , Mark M. Leather , Michael J. Mantor , Yunxiao Zou
IPC: G06F9/38 , G06F9/30 , G06F12/0875 , G06F12/0891
CPC classification number: G06F9/3867 , G06F9/3001 , G06F9/30021 , G06F9/30036 , G06F9/3012 , G06F9/30141 , G06F9/3802 , G06F9/3826 , G06F9/383 , G06F9/3832 , G06F9/3857 , G06F12/0804 , G06F12/0855 , G06F12/0875 , G06F12/0891 , G06F12/121 , G06F2212/1008 , G06F2212/1024 , G06F2212/452
Abstract: Systems, apparatuses, and methods for implementing a high bandwidth, low power vector register file for use by a parallel processor are disclosed. In one embodiment, a system includes at least a parallel processing unit with a plurality of processing pipeline. The parallel processing unit includes a vector arithmetic logic unit and a high bandwidth, low power, vector register file. The vector register file includes multi-bank high density random-access memories (RAMs) to satisfy register bandwidth requirements. The parallel processing unit also includes an instruction request queue and an instruction operand buffer to provide enough local bandwidth for VALU instructions and vector I/O instructions. Also, the parallel processing unit is configured to leverage the RAM's output flops as a last level cache to reduce duplicate operand requests between multiple instructions. The parallel processing unit includes a vector destination cache to provide additional R/W bandwidth for the vector register file.
-
公开(公告)号:US12299413B2
公开(公告)日:2025-05-13
申请号:US18414164
申请日:2024-01-16
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Bin He , Brian Emberling , Mark Leather , Michael Mantor
Abstract: A processing system executes wavefronts at multiple arithmetic logic unit (ALU) pipelines of a single instruction multiple data (SIMD) unit in a single execution cycle. The ALU pipelines each include a number of ALUs that execute instructions on wavefront operands that are collected from vector general process register (VGPR) banks at a cache and output results of the instructions executed on the wavefronts at a buffer. By storing wavefronts supplied by the VGPR banks at the cache, a greater number of wavefronts can be made available to the SIMD unit without increasing the VGPR bandwidth, enabling multiple ALU pipelines to execute instructions during a single execution cycle.
-
公开(公告)号:US20250130767A1
公开(公告)日:2025-04-24
申请号:US18921401
申请日:2024-10-21
Applicant: Advanced Micro Devices, Inc.
Inventor: Shubh Shah , Ashutosh Garg , Bin He , Michael Mantor , Shubra Marwaha , Subramaniam Maiyuran
IPC: G06F7/483
Abstract: The disclosed circuit can select micro-operations specifically for converting a value in a first number format to a second number format. The circuit can include micro-operations for various conversions between different number formats, including number formats of different floating-point precisions. Various other methods, systems, and computer-readable media are also disclosed.
-
-
-
-
-
-
-
-
-