Patent search ap:"Advanced Micro Devices Page Inc."

441.

发明申请
INDIRECT CHAINING OF COMMAND BUFFERS 有权

公开(公告)号：US20220058767A1

公开(公告)日：2022-02-24

申请号：US17519992

申请日：2021-11-05

Applicant: Advanced Micro Devices, Inc.

Inventor： Hans Fernlund , Mitchell H. Singer , Manu Rastogi

IPC: G06T1/20 , G06F9/30 , G09G5/395 , G09G5/36 , G06T1/60

Abstract: Systems, apparatuses, and methods for enabling indirect chaining of command buffers are disclosed. A system includes at least first and second processors and a memory. The first processor generates a plurality of command buffers and stores the plurality of command buffers in the memory. The first processor also generates and stores, in the memory, a table with entries specifying addresses of the plurality of command buffers and an order in which to process the command buffers. The first processor conveys an indirect buffer packet to the second processor, where the indirect buffer packet specifies a location and a size of the table in the memory. The second processor retrieves an initial entry from the table, processes a first command buffer at the address specified in the initial entry, and then returns to the table for the next entry upon completing processing of the first command buffer.

442.

发明授权
Using loop exit prediction to accelerate or suppress loop mode of a processor 有权

公开(公告)号：US11256505B2

公开(公告)日：2022-02-22

申请号：US17169053

申请日：2021-02-05

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Arunachalam Annamalai , Marius Evers , Aparna Thyagarajan , Anthony Jarvis

IPC: G06F9/30 , G06F9/38 , G06F1/3296

Abstract: A processor predicts a number of loop iterations associated with a set of loop instructions. In response to the predicted number of loop iterations exceeding a first loop iteration threshold, the set of loop instructions are executed in a loop mode that includes placing at least one component of an instruction pipeline of the processor in a low-power mode or state and executing the set of loop instructions from a loop buffer. In response to the predicted number of loop iterations being less than or equal to a second loop iteration threshold, the set of instructions are executed in a non-loop mode that includes maintaining at least one component of the instruction pipeline in a powered up state and executing the set of loop instructions from an instruction fetch unit of the instruction pipeline.

443.

发明授权
System and method for scheduling instructions in a multithread SIMD architecture with a fixed number of registers 有权

公开(公告)号：US11243904B2

公开(公告)日：2022-02-08

申请号：US16813336

申请日：2020-03-09

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Robert A. Gottlieb , Christopher L. Reeve , Michael John Bedy

IPC: G06F15/80

Abstract: A method and apparatus for scheduling instructions of a shader program for a graphics processing unit (GPU) with a fixed number of registers. The method and apparatus include computing, via a processing unit (PU), a liveness-based register usage across all basic blocks in the shader program, computing, via the PU, the range of numbers of waves of a plurality of registers for the shader program, assessing the impact of available post-register allocation optimizations, computing, via the PU, the scoring data based on number of waves of the plurality of registers, and computing, via the PU, the number of waves for execution for the plurality of registers.

444.

发明授权
Adaptive world switching 有权

公开(公告)号：US11243799B2

公开(公告)日：2022-02-08

申请号：US16556521

申请日：2019-08-30

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Alexander Fuad Ashkar , Hans Fernlund

IPC: G06F9/455 , G06F9/38

Abstract: An apparatus includes a plurality of virtual machines, a hypervisor coupled to the plurality of virtual machines, and a graphical processing unit (GPU) coupled to the hypervisor. The plurality of virtual machines are allocated a plurality of time slices. The hypervisor initiates a world switch to a first virtual machine of the plurality of virtual machines. The GPU makes a determination as to whether to adjust the time slice associated with the first virtual machine based on an assessment of time slice adjustment parameters related to an execution time of at least one of the plurality of virtual machines.

445.

发明授权
Multi-version shaders 有权

公开(公告)号：US11243752B2

公开(公告)日：2022-02-08

申请号：US16509165

申请日：2019-07-11

Applicant: Advanced Micro Devices, Inc.

Inventor： Sumesh Udayakumaran

IPC: G06F15/00 , G06F8/41 , G06T15/06 , G06T15/00 , G06F8/71

Abstract: Described herein are techniques for generating a stitched shader program. The techniques include identifying a set of shader programs to include in the stitched shader program, wherein the set includes at least one multiversion shader program that includes a first version of instructions and a second version of instructions, wherein the first version of instructions uses a first number of resources that is different than a second number of resources used by the second version of instructions. The techniques also include combining the set of shader programs to form the stitched shader program. The techniques further include determining a number of resources for the stitched shader program. The techniques also include based on the determined number of resources, modifying the instructions corresponding to the multiversion shader program to, when executed, execute either the first version of instructions, or the second version of instructions.

446.

发明申请
TERMINATION CALIBRATION SCHEME USING A CURRENT MIRROR 有权

公开(公告)号：US20220038102A1

公开(公告)日：2022-02-03

申请号：US17502741

申请日：2021-10-15

Applicant: Advanced Micro Devices, Inc.

Inventor： Achal Kathuria , Pradeep Jayaraman

IPC: H03K21/08 , H03K5/24 , H03K19/01

Abstract: Systems, apparatuses, and methods for conveying and receiving information as electrical signals in a computing system are disclosed. A computing system includes multiple transmitters sending singled-ended data signals to multiple receivers. A termination voltage is generated and sent to the multiple receivers. The termination voltage is coupled to each of signal termination circuitry and signal sampling circuitry within each of the multiple receivers. Any change in the termination voltage affects the termination circuitry and affects comparisons performed by the sampling circuitry. Received signals are reconstructed at the receivers using the received signals, the signal termination circuitry and the signal sampling circuitry.

447.

发明申请
DATA COMMUNICATIONS WITH ENHANCED SPEED MODE 有权

公开(公告)号：US20220035765A1

公开(公告)日：2022-02-03

申请号：US17503959

申请日：2021-10-18

Applicant: ATI Technologies ULC , Advanced Micro Devices, Inc.

Inventor： Gordon Caruk , Gerald R. Talbot

IPC: G06F13/42 , G06F13/12 , G06F13/40

Abstract: An interconnect controller for a data processing platform includes a data link layer controller for selectively receiving data packets from and sending data packets to a higher protocol layer, and a physical layer controller coupled to the data link layer controller and adapted to be coupled to a communication link. The physical layer controller operates according to a predetermined protocol selectively at one of a plurality of enhanced speeds that are not specified by any published standard and are separated from each other by the same predetermined amount. In response to performing a link initialization, the interconnect controller performs at least one setup operation to select a speed, and subsequently operates the communication link using a selected speed.

448.

发明授权
Arithemetic logic unit register sequencing 有权

公开(公告)号：US11237827B2

公开(公告)日：2022-02-01

申请号：US16696108

申请日：2019-11-26

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Bin He , Jiasheng Chen , Jian Huang

IPC: G06F12/02 , G06F9/30 , G06F7/57 , G06F9/48

Abstract: A graphics processing unit (GPU) sequences provision of operands to a set of operand registers, thereby allowing the GPU to share at least one of the operand registers between processing. The GPU includes a plurality of arithmetic logic units (ALUs) with at least one of the ALUs configured to perform double precision operations. The GPU further includes a set of operand registers configured to store single precision operands. For a plurality of executing threads that request double precision operations, the GPU stores the corresponding operands at the operand registers. Over a plurality of execution cycles, the GPU sequences transfer of operands from the set of operand registers to a designated double precision operand register. During each execution cycle, the double-precision ALU executes a double precision operation using the operand stored at the double precision operand register.

449.

发明申请
ASSIGNING VARIABLE LENGTH ADDRESS IDENTIFIERS TO PACKETS IN A PROCESSING SYSTEM 有权

公开(公告)号：US20220029954A1

公开(公告)日：2022-01-27

申请号：US17496256

申请日：2021-10-07

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： David A. Roberts

IPC: H04L29/12 , H04L12/741 , H04L12/753 , H04L29/06

Abstract: A controller assigns variable length addresses to addressable elements that are connected to a network. The variable length addresses are determined based on probabilities that packets are addressed to the corresponding addressable element. The controller transmits, to the addressable elements via the network, a routing table indicating the variable length addresses assigned to the addressable elements. Routers or addressable elements receive the routing table and route one or more packets over the network to an addressable element using variable length addresses included in a header of the one or more packets.

450.

发明申请
ALLREDUCE ENHANCED DIRECT MEMORY ACCESS FUNCTIONALITY 有权

公开(公告)号：US20210406209A1

公开(公告)日：2021-12-30

申请号：US17032195

申请日：2020-09-25

Applicant: Advanced Micro Devices, Inc.

Inventor： Abhinav Vishnu , Joseph Lee Greathouse

IPC: G06F13/28

Abstract: Systems, apparatuses, and methods for performing an allreduce operation on an enhanced direct memory access (DMA) engine are disclosed. A system implements a machine learning application which includes a first kernel and a second kernel. The first kernel corresponds to a first portion of a machine learning model while the second kernel corresponds to a second portion of the machine learning model. The first kernel is invoked on a plurality of compute units and the second kernel is converted into commands executable by an enhanced DMA engine to perform a collective communication operation. The first kernel is executed on the plurality of compute units in parallel with the enhanced DMA engine executing the commands for performing the collective communication operation. As a result, the allreduce operation may be executed in parallel on the enhanced DMA engine to the compute units.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification