Patent search ap:("Advanced Micro Devices Page Inc.") AND inv:"Khaled Hamidouche"

11.

发明授权
Optimized and scalable sparse triangular linear systems on networks of accelerators 有权

公开(公告)号：US10936697B2

公开(公告)日：2021-03-02

申请号：US16044145

申请日：2018-07-24

Applicant: Advanced Micro Devices, Inc.

Inventor： Khaled Hamidouche , Michael W. LeBeane , Nicholas P. Malaya , Joseph L. Greathouse

IPC: G06F17/16 , G06F9/38 , G06F9/30 , G06F17/12

Abstract: A method includes storing a first portion of a sparse triangular matrix in a local memory and launching a kernel for executing a set of workgroups. The first portion includes a plurality of row blocks, and each workgroup in the set of workgroups is associated with one of the plurality of row blocks. The method also includes, for each workgroup in the set of workgroups, solving the row block. The row block is solved by, for each row segment of a first subset of row segments in the row block, calculating a partial sum for the row segment based on one or more matrix elements in the row segment, and writing the partial sum to a remote memory of a first remote processing unit prior to terminating the kernel.

12.

发明授权
Network packet templating for GPU-initiated communication 有权

公开(公告)号：US10740163B2

公开(公告)日：2020-08-11

申请号：US16022498

申请日：2018-06-28

Applicant: Advanced Micro Devices, Inc.

Inventor： Khaled Hamidouche , Michael Wayne LeBeane , Walter B. Benton

IPC: G06F9/54 , G06F12/02

Abstract: Systems, apparatuses, and methods for performing network packet templating for graphics processing unit (GPU)-initiated communication are disclosed. A central processing unit (CPU) creates a network packet according to a template and populates a first subset of fields of the network packet with static data. Next, the CPU stores the network packet in a memory. A GPU initiates execution of a kernel and detects a network communication request within the kernel and prior to the kernel completing execution. Responsive to this determination, the GPU populates a second subset of fields of the network packet with runtime data. Then, the GPU generates a notification that the network packet is ready to be processed. A network interface controller (NIC) processes the network packet using data retrieved from the first subset of fields and from the second subset of fields responsive to detecting the notification.

13.

发明申请
OPTIMIZED AND SCALABLE SPARSE TRIANGULAR LINEAR SYSTEMS ON NETWORKS OF ACCELERATORS 审中-公开

公开(公告)号：US20200034405A1

公开(公告)日：2020-01-30

申请号：US16044145

申请日：2018-07-24

Applicant: Advanced Micro Devices, Inc.

Inventor： Khaled Hamidouche , Michael W. LeBeane , Nicholas P. Malaya , Joseph L. Greathouse

IPC: G06F17/16 , G06F9/38 , G06F17/12 , G06F9/30

Abstract: A method includes storing a first portion of a sparse triangular matrix in a local memory and launching a kernel for executing a set of workgroups. The first portion includes a plurality of row blocks, and each workgroup in the set of workgroups is associated with one of the plurality of row blocks. The method also includes, for each workgroup in the set of workgroups, solving the row block. The row block is solved by, for each row segment of a first subset of row segments in the row block, calculating a partial sum for the row segment based on one or more matrix elements in the row segment, and writing the partial sum to a remote memory of a first remote processing unit prior to terminating the kernel.

14.

发明申请
NETWORK PACKET TEMPLATING FOR GPU-INITIATED COMMUNICATION 审中-公开

公开(公告)号：US20200004610A1

公开(公告)日：2020-01-02

申请号：US16022498

申请日：2018-06-28

Applicant: Advanced Micro Devices, Inc.

Inventor： Khaled Hamidouche , Michael Wayne LeBeane , Walter B. Benton

IPC: G06F9/54 , G06F12/02

Abstract: Systems, apparatuses, and methods for performing network packet templating for graphics processing unit (GPU)-initiated communication are disclosed. A central processing unit (CPU) creates a network packet according to a template and populates a first subset of fields of the network packet with static data. Next, the CPU stores the network packet in a memory. A GPU initiates execution of a kernel and detects a network communication request within the kernel and prior to the kernel completing execution. Responsive to this determination, the GPU populates a second subset of fields of the network packet with runtime data. Then, the GPU generates a notification that the network packet is ready to be processed. A network interface controller (NIC) processes the network packet using data retrieved from the first subset of fields and from the second subset of fields responsive to detecting the notification.

15.

发明公开
Multi-Tree Reduction with Execution Skew 审中-公开

公开(公告)号：US20240311182A1

公开(公告)日：2024-09-19

申请号：US18185641

申请日：2023-03-17

Applicant: Advanced Micro Devices, Inc.

Inventor： Kishore Punniyamurthy , Sagnik Basu , Khaled Hamidouche , Brandon Keith Potter

IPC: G06F9/48

CPC classification number: G06F9/4881

Abstract: A device includes a communication scheduler to generate schedule trees for scheduling data communication among a plurality of nodes configured to perform a collective operation using data contributed from the plurality of nodes. The device includes data reduction logic to: identify one or more skewed nodes among the plurality of nodes, perform, according to a first set of schedule trees, a first operation to generate partial results based on data contributed from non-skewed nodes, and perform, according to a second set of schedule trees, a second operation to generate final results based on the partial results and data contributed from the one or more skewed nodes.

16.

发明公开
DISTRIBUTED CACHING POLICY FOR LARGE-SCALE DEEP LEARNING TRAINING DATA PRE-PROCESSING 审中-公开

公开(公告)号：US20240211399A1

公开(公告)日：2024-06-27

申请号：US18089480

申请日：2022-12-27

Applicant: Advanced Micro Devices, Inc.

Inventor： Kishore Punniyamurthy , Khaled Hamidouche , Brandon Keith Potter

IPC: G06F12/0813 , G06N20/00

CPC classification number: G06F12/0813 , G06N20/00

Abstract: A distributed cache network used for machine learning is provided which comprises a network fabric having file systems which store data and a plurality of processing devices, each comprising cache memory and a processor configured to execute a training of a machine learning model and selectively cache portions of the data based on a frequency with which the data is accessed by the processor. Each processing device stores metadata identifying portions of data which are cached in the cache memory and other portions of the data which are cached in other processing devices of the network. When requested data is not cached in another processing device, the portion of requested data is accessed from a network file system via a client to server channel and is accessed from another processing device via a client to client channel when the requested data is cached in the other processing device.

17.

发明授权
Network command coalescing on GPUs 有权

公开(公告)号：US11922207B2

公开(公告)日：2024-03-05

申请号：US16993150

申请日：2020-08-13

Applicant: Advanced Micro Devices, Inc.

Inventor： Michael W. LeBeane , Khaled Hamidouche , Brandon K. Potter

IPC: G06F9/38 , G06F9/48 , G06F9/54 , H04L67/10 , G06T1/20

CPC classification number: G06F9/48 , G06F9/3836 , G06F9/3887 , G06F9/54 , H04L67/10 , G06T1/20

Abstract: An approach is provided for coalescing network commands in a GPU that implements a SIMT architecture. Compatible next network operations from different threads are coalesced into a single network command packet. This reduces the number of network command packets generated and issued by threads, thereby increasing efficiency, and improving throughput. The approach is applicable to any number of threads and any thread organization methodology, such as wavefronts, warps, etc.

18.

发明公开
Communication of Data for a Model Between Nodes in an Electronic Device 审中-公开

公开(公告)号：US20240005126A1

公开(公告)日：2024-01-04

申请号：US17853670

申请日：2022-06-29

Applicant: Advanced Micro Devices, Inc.

Inventor： Kishore Punniyamurthy , Khaled Hamidouche , Brandon K. Potter , Rohit Shahaji Zambre

IPC: G06N3/04 , G06N3/08

CPC classification number: G06N3/04 , G06N3/08

Abstract: An electronic device includes one or more data producing nodes and a data consuming node. Each data producing node separately generates two or more portions of a respective block of data. Upon completing generating each portion of the two or more portions of the respective block of data, each data producing node communicates that portion of the respective block of data to the data consuming node. Upon receiving corresponding portions of the respective blocks of data from each of the one or more data producing nodes, the data consuming node performs operations for a model using the corresponding portions of the respective blocks of data.

19.

发明申请
GPU NETWORKING USING AN INTEGRATED COMMAND PROCESSOR 有权

公开(公告)号：US20230120934A1

公开(公告)日：2023-04-20

申请号：US18068836

申请日：2022-12-20

Applicant: Advanced Micro Devices, Inc.

Inventor： Michael Wayne LeBeane , Khaled Hamidouche , Walter B. Benton

IPC: G06F9/54 , G06F9/30 , H04L61/10

Abstract: Systems, apparatuses, and methods for generating network messages on a parallel processor are disclosed. A system includes at least a parallel processor, a general purpose processor, and a network interface unit. The parallel processor includes at least a plurality of compute units, a command processor, and a cache. A thread within a kernel executing on a compute unit of the parallel processor generates a network message and stores the network message and a corresponding indication in the cache. In response to detecting the indication of the network message in the cache, the command processor processes and conveys the network message to the network interface unit without involving the general purpose processor.

20.

发明授权
GPU networking using an integrated command processor 有权

公开(公告)号：US11544121B2

公开(公告)日：2023-01-03

申请号：US15815043

申请日：2017-11-16

Applicant: Advanced Micro Devices, Inc.

Inventor： Michael Wayne LeBeane , Khaled Hamidouche , Walter B. Benton

IPC: G06F9/54 , H04L61/10 , G06F9/30 , G06F15/76

Abstract: Systems, apparatuses, and methods for generating network messages on a parallel processor are disclosed. A system includes at least a parallel processor, a general purpose processor, and a network interface unit. The parallel processor includes at least a plurality of compute units, a command processor, and a cache. A thread within a kernel executing on a compute unit of the parallel processor generates a network message and stores the network message and a corresponding indication in the cache. In response to detecting the indication of the network message in the cache, the command processor processes and conveys the network message to the network interface unit without involving the general purpose processor.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification