Patent search ap:("Advanced Micro Devices Page Inc.") AND inv:"Sooraj Puthoor"

1.

发明授权
FPGA-based programmable data analysis and compression front end for GPU 有权

公开(公告)号：US12099789B2

公开(公告)日：2024-09-24

申请号：US17118442

申请日：2020-12-10

Applicant: Advanced Micro Devices, Inc.

Inventor： Kevin Y. Cheng , Sooraj Puthoor , Onur Kayiran

IPC: G06F30/331 , G06F9/38 , G06F30/34

CPC classification number: G06F30/331 , G06F9/3877 , G06F30/34

Abstract: Methods, devices, and systems for information communication. Information transmitted from a host to a graphics processing unit (GPU) is received by information analysis circuitry of a field-programmable gate array (FPGA). A pattern in the information is determined by the information analysis circuitry. A predicted information pattern is determined, by the information analysis circuitry, based on the information. An indication of the predicted information pattern is transmitted to the host. Responsive to a signal from the host based on the predicted information pattern, the FPGA is reprogrammed to implement decompression circuitry based on the predicted information pattern. In some implementations, the information includes a plurality of packets. In some implementations, the predicted information pattern includes a pattern in a plurality of packets. In some implementations, the predicted information pattern includes a zero data pattern.

2.

发明授权
Hardware accelerated dynamic work creation on a graphics processing unit 有权

公开(公告)号：US11550627B2

公开(公告)日：2023-01-10

申请号：US17215171

申请日：2021-03-29

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Anthony Gutierrez , Sooraj Puthoor

IPC: G06F9/48 , G06F9/54 , G06F9/38

Abstract: A processor core is configured to execute a parent task that is described by a data structure stored in a memory. A coprocessor is configured to dispatch a child task to the at least one processor core in response to the coprocessor receiving a request from the parent task concurrently with the parent task executing on the at least one processor core. In some cases, the parent task registers the child task in a task pool and the child task is a future task that is configured to monitor a completion object and enqueue another task associated with the future task in response to detecting the completion object. The future task is configured to self-enqueue by adding a continuation future task to a continuation queue for subsequent execution in response to the future task failing to detect the completion object.

3.

发明授权
Method and apparatus for inter-lane thread migration 有权

公开(公告)号：US10409610B2

公开(公告)日：2019-09-10

申请号：US15010093

申请日：2016-01-29

Applicant: Advanced Micro Devices, Inc.

Inventor： Bradford Beckmann , Sooraj Puthoor

IPC: G06F9/38 , G06F9/48

Abstract: Briefly, methods and apparatus to migrate a software thread from one wavefront executing on one execution unit to another wavefront executing on another execution unit whereby both execution units are associated with a compute unit of a processing device such as, for example, a GPU. The methods and apparatus may execute compiled dynamic thread migration swizzle buffer instructions that when executed allow access to a dynamic thread migration swizzle buffer that allows for the migration of register context information when migrating software threads. The register context information may be located in one or more locations of a register file prior to storing the register context information into the dynamic thread migration swizzle buffer. The method and apparatus may also return the register context information from the dynamic thread migration swizzle buffer to one or more different register file locations of the register file.

4.

发明授权
Managing cache coherence using information in a page table 有权

公开(公告)号：US10019377B2

公开(公告)日：2018-07-10

申请号：US15162464

申请日：2016-05-23

Applicant: Advanced Micro Devices, Inc.

Inventor： Arkaprava Basu , Bradford M. Beckmann , Shuai Che , Sooraj Puthoor

IPC: G06F12/1009 , G06F12/0837 , G06F12/14 , G06F12/0817 , G06F12/1027

CPC classification number: G06F12/1009 , G06F12/0817 , G06F12/0837 , G06F12/1027 , G06F12/1483 , G06F2212/1024 , G06F2212/1052 , G06F2212/621 , G06F2212/657 , Y02D10/13

Abstract: The described embodiments include a computing device with two or more types of processors and a memory that is shared between the two or more types of processors. The computing device performs operations for handling cache coherency between the two or more types of processors. During operation, the computing device sets a cache coherency indicator in metadata in a page table entry in a page table, the page table entry information about a page of data that is stored in the memory. The computing device then uses the cache coherency indicator to determine operations to be performed when accessing data in the page of data in the memory. For example, the computing device can use the coherency indicator to determine whether a coherency operation is to be performed when a processor of a given type accesses data in the page of data in the memory.

5.

发明授权
Predicting a context portion to move between a context buffer and registers based on context portions previously used by at least one other thread 有权

公开(公告)号：US10019283B2

公开(公告)日：2018-07-10

申请号：US14746601

申请日：2015-06-22

Applicant: Advanced Micro Devices, Inc.

Inventor： Dmitri Yudanov , Sergey Blagodurov , Arkaprava Basu , Sooraj Puthoor , Joseph L. Greathouse

IPC: G06F9/46 , G06F9/30

CPC classification number: G06F9/461 , G06F9/3013 , G06F9/3851

Abstract: A processing device includes a first memory that includes a context buffer. The processing device also includes a processor core to execute threads based on context information stored in registers of the processor core and a memory controller to selectively move a subset of the context information between the context buffer and the registers based on one or more latencies of the threads.

6.

发明授权
Multi-kernel wavefront scheduler 有权

公开(公告)号：US12099867B2

公开(公告)日：2024-09-24

申请号：US15993061

申请日：2018-05-30

Applicant: Advanced Micro Devices, Inc.

Inventor： Sooraj Puthoor , Joseph Gross , Xulong Tang , Bradford Michael Beckmann

IPC: G06F9/46 , G06F9/48

CPC classification number: G06F9/4881

Abstract: Systems, apparatuses, and methods for implementing a multi-kernel wavefront scheduler are disclosed. A system includes at least a parallel processor coupled to one or more memories, wherein the parallel processor includes a command processor and a plurality of compute units. The command processor launches multiple kernels for execution on the compute units. Each compute unit includes a multi-level scheduler for scheduling wavefronts from multiple kernels for execution on its execution units. A first level scheduler creates scheduling groups by grouping together wavefronts based on the priority of their kernels. Accordingly, wavefronts from kernels with the same priority are grouped together in the same scheduling group by the first level scheduler. Next, the first level scheduler selects, from a plurality of scheduling groups, the highest priority scheduling group for execution. Then, a second level scheduler schedules wavefronts for execution from the scheduling group selected by the first level scheduler.

7.

发明授权
Memory request priority assignment techniques for parallel processors 有权

公开(公告)号：US11507522B2

公开(公告)日：2022-11-22

申请号：US16706421

申请日：2019-12-06

Applicant: Advanced Micro Devices, Inc.

Inventor： Sooraj Puthoor , Kishore Punniyamurthy , Onur Kayiran , Xianwei Zhang , Yasuko Eckert , Johnathan Alsop , Bradford Michael Beckmann

IPC: G06F13/18 , G06F13/16

Abstract: Systems, apparatuses, and methods for implementing memory request priority assignment techniques for parallel processors are disclosed. A system includes at least a parallel processor coupled to a memory subsystem, where the parallel processor includes at least a plurality of compute units for executing wavefronts in lock-step. The parallel processor assigns priorities to memory requests of wavefronts on a per-work-item basis by indexing into a first priority vector, with the index generated based on lane-specific information. If a given event is detected, a second priority vector is generated by applying a given priority promotion vector to the first priority vector. Then, for subsequent wavefronts, memory requests are assigned priorities by indexing into the second priority vector with lane-specific information. The use of priority vectors to assign priorities to memory requests helps to reduce the memory divergence problem experienced by different work-items of a wavefront.

8.

发明授权
System performance management using prioritized compute units 有权

公开(公告)号：US11204871B2

公开(公告)日：2021-12-21

申请号：US14755401

申请日：2015-06-30

Applicant: Advanced Micro Devices, Inc.

Inventor： Zhe Wang , Sooraj Puthoor , Bradford M. Beckmann

IPC: G06F12/08 , G06F12/084

Abstract: Methods, devices, and systems for managing performance of a processor having multiple compute units. An effective number of the multiple compute units may be determined to designate as having priority. On a condition that the effective number is nonzero, the effective number of the multiple compute units may each be designated as a priority compute unit. Priority compute units may have access to a shared cache whereas non-priority compute units may not. Workgroups may be preferentially dispatched to priority compute units. Memory access requests from priority compute units may be served ahead of requests from non-priority compute units.

9.

发明授权
Hardware accelerated dynamic work creation on a graphics processing unit 有权

公开(公告)号：US10963299B2

公开(公告)日：2021-03-30

申请号：US16134695

申请日：2018-09-18

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Anthony Gutierrez , Sooraj Puthoor

IPC: G06F9/48 , G06F9/54 , G06F9/38

Abstract: A processor core is configured to execute a parent task that is described by a data structure stored in a memory. A coprocessor is configured to dispatch a child task to the at least one processor core in response to the coprocessor receiving a request from the parent task concurrently with the parent task executing on the at least one processor core. In some cases, the parent task registers the child task in a task pool and the child task is a future task that is configured to monitor a completion object and enqueue another task associated with the future task in response to detecting the completion object. The future task is configured to self-enqueue by adding a continuation future task to a continuation queue for subsequent execution in response to the future task failing to detect the completion object.

10.

发明申请
SCOPED PERSISTENCE BARRIERS FOR NON-VOLATILE MEMORIES 审中-公开

公开(公告)号：US20190286362A1

公开(公告)日：2019-09-19

申请号：US16432391

申请日：2019-06-05

Applicant: Advanced Micro Devices, Inc.

Inventor： Arkaprava Basu , Mitesh R. Meswani , Dibakar Gope , Sooraj Puthoor

IPC: G06F3/06 , G06F12/02

Abstract: A processing apparatus is provided that includes NVRAM and one or more processors configured to process a first set and a second set of instructions according to a hierarchical processing scope and process a scoped persistence barrier residing in the program after the first instruction set and before the second instruction set. The barrier includes an instruction to cause first data to persist in the NVRAM before second data persists in the NVRAM. The first data results from execution of each of the first set of instructions processed according to the one hierarchical processing scope. The second data results from execution of each of the second set of instructions processed according to the one hierarchical processing scope. The processing apparatus also includes a controller configured to cause the first data to persist in the NVRAM before the second data persists in the NVRAM based on the scoped persistence barrier.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification