Patent search ap:("ADVANCED MICRO DEVICES Page INC.") AND inv:"Khaled Hamidouche"

1.

发明申请
Non-Blocking Parallel Bulk Memory Operations 有权

公开(公告)号：US20250110647A1

公开(公告)日：2025-04-03

申请号：US18477885

申请日：2023-09-29

Applicant: Advanced Micro Devices, Inc.

Inventor： Johnathan Robert Alsop , Shaizeen Dilawarhusen Aga , Khaled Hamidouche

IPC: G06F3/06

Abstract: Non-blocking processing system are described. In accordance with the described techniques, a pending range store receives, at a start of a bulk memory operation, a pending memory range of the bulk memory operation. A logic unit includes at least one of check conflict logic or check address logic. The logic unit detects a conflicting memory access based on a target address of the pending memory range conflicting with a memory access request separate from the bulk memory operation. The logic unit performs at least a portion of the bulk memory operation associated with the target address before the memory access request is allowed to proceed.

2.

发明申请
OPTIMIZED ASYNCHRONOUS TRAINING OF NEURAL NETWORKS USING A DISTRIBUTED PARAMETER SERVER WITH EAGER UPDATES 审中-公开

公开(公告)号：US20190258924A1

公开(公告)日：2019-08-22

申请号：US15898433

申请日：2018-02-17

Applicant: Advanced Micro Devices, Inc.

Inventor： Khaled Hamidouche , Michael W LeBeane , Walter B Benton , Michael L Chu

IPC: G06N3/08 , G06F15/173

Abstract: A method of training a neural network includes, at a local computing node, receiving remote parameters from a set of one or more remote computing nodes, initiating execution of a forward pass in a local neural network in the local computing node to determine a final output based on the remote parameters, initiating execution of a backward pass in the local neural network to determine updated parameters for the local neural network, and prior to completion of the backward pass, transmitting a subset of the updated parameters to the set of remote computing nodes.

3.

发明申请
IN-SWITCH EMBEDDING BAG POOLING 有权

公开(公告)号：US20250110899A1

公开(公告)日：2025-04-03

申请号：US18478659

申请日：2023-09-29

Applicant: Advanced Micro Devices, Inc.

Inventor： Kishore Punniyamurthy , Khaled Hamidouche , Brandon K. Potter

IPC: G06F13/16 , G06F12/02 , G06N3/08

Abstract: An apparatus and method for reducing the memory bandwidth of executing machine learning models. A computing system includes two or more processing nodes, each including at least one or more processors and a corresponding local memory. Switch circuitry communicates with at least the local memories and a system memory of the computing system. The switch includes multiple direct memory access (DMA) interfaces. Each of one or more processing nodes stores multiple embedding rows of embedding tables. A processor of the processing node identifies two or more embedding rows as source operands of a reduction operation. The switch executes memory access requests to retrieve data of the two or more embedding rows from the corresponding local memory, and generates a result by performing the reduction operation. The switch sends the result to the local memory.

4.

发明授权
Systems and methods for reducing instruction code memory footprint for multiple processes executed at a coprocessor 有权

公开(公告)号：US12086447B2

公开(公告)日：2024-09-10

申请号：US16719076

申请日：2019-12-18

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor： Khaled Hamidouche , Michael W. Lebeane , Hari S. Thangirala

IPC: G06F3/06 , G06F12/0882

CPC classification number: G06F3/0647 , G06F3/0611 , G06F3/0659 , G06F3/0688 , G06F12/0882 , G06F2212/7201

Abstract: A processing system includes a first processor couplable to a first memory and a second memory. In response to a page migration trigger for a page in the first memory, the first processor is configured to, responsive to the page being a read-only page storing code for execution, initiate migration of the page to a code cache portion of a second memory associated with a second processor and shared by multiple processes executing at the second processor, and to configure each process of a set of processes executing at the second processor to access and execute the code from the code cache portion.

5.

发明申请
EFFICIENT MEMORY-SEMANTIC NETWORKING USING SCOPED MEMORY MODELS 有权

公开(公告)号：US20220100391A1

公开(公告)日：2022-03-31

申请号：US17033170

申请日：2020-09-25

Applicant: Advanced Micro Devices, Inc.

Inventor： Michael W. LeBeane , Khaled Hamidouche , Hari S. Thangirala , Brandon Keith Potter

IPC: G06F3/06 , G06F12/02 , G06F12/0802

Abstract: A framework disclosed herein extends a relaxed, scoped memory model to a system that includes nodes across a commodity network and maintains coherency across the system. A new scope, cluster scope, is defined, that allows for memory accesses at scopes less than cluster scope to operate on locally cached versions of remote data from across the commodity network without having to issue expensive network operations. Cluster scope operations generate network commands that are used to synchronize memory across the commodity network.

6.

发明申请
NETWORK-RELATED PERFORMANCE FOR GPUS 审中-公开

公开(公告)号：US20200034195A1

公开(公告)日：2020-01-30

申请号：US16049216

申请日：2018-07-30

Applicant: Advanced Micro Devices, Inc.

Inventor： Michael W. LeBeane , Khaled Hamidouche , Bradford M. Beckmann

IPC: G06F9/48 , G06F9/50 , G06F9/54

Abstract: Techniques for improved networking performance in systems where a graphics processing unit or other highly parallel non-central-processing-unit (referred to as an accelerated processing device or “APD” herein) has the ability to directly issue commands to a networking device such as a network interface controller (“NIC”) are disclosed. According to a first technique, the latency associated with loading certain metadata into NIC hardware memory is reduced or eliminated by pre-fetching network command queue metadata into hardware network command queue metadata slots of the NIC, thereby reducing the latency associated with fetching that metadata at a later time. A second technique involves reducing latency by prioritizing work on an APD when it is known that certain network traffic is soon to arrive over the network via a NIC.

7.

发明申请
Low Latency Offloading of Collectives over a Switch 有权

公开(公告)号：US20250077409A1

公开(公告)日：2025-03-06

申请号：US18240640

申请日：2023-08-31

Applicant: Advanced Micro Devices, Inc , ATI Technologies ULC

Inventor： Kishore Punniyamurthy , Richard David Sodke , Furkan Eris , Sergey Blagodurov , Bradford Michael Beckmann , Brandon Keith Potter , Khaled Hamidouche

IPC: G06F12/02 , G06F13/16

Abstract: A device includes a plurality of processing elements (PEs). A symmetric memory is allocated in each of the plurality of PEs. The device includes a switch connected to the plurality of PEs. The switch is to: receive, from a first processing element (PE) of the plurality of PEs, a message that includes a buffer offset, compute, based on the buffer offset, a first memory address of a first buffer in a first symmetric memory of the first PE and a second memory address of a second buffer in a second symmetric memory of a second PE of the plurality of PEs, and initiate, based on the first memory address and the second memory address, a direct memory access operation to access the first buffer and the second buffer.

8.

发明授权
Efficient memory-semantic networking using scoped memory models 有权

公开(公告)号：US12086422B2

公开(公告)日：2024-09-10

申请号：US18320819

申请日：2023-05-19

Applicant: Advanced Micro Devices, Inc.

Inventor： Michael W. LeBeane , Khaled Hamidouche , Hari S. Thangirala , Brandon Keith Potter

IPC: G06F3/06 , G06F12/02 , G06F12/0802

CPC classification number: G06F3/0619 , G06F3/0656 , G06F3/067 , G06F12/0223 , G06F12/0802 , G06F2212/152

Abstract: A framework disclosed herein extends a relaxed, scoped memory model to a system that includes nodes across a commodity network and maintains coherency across the system. A new scope, cluster scope, is defined, that allows for memory accesses at scopes less than cluster scope to operate on locally cached versions of remote data from across the commodity network without having to issue expensive network operations. Cluster scope operations generate network commands that are used to synchronize memory across the commodity network.

9.

发明公开
Processing Element-Centric All-to-All Communication 审中-公开

公开(公告)号：US20240220336A1

公开(公告)日：2024-07-04

申请号：US18147081

申请日：2022-12-28

Applicant: Advanced Micro Devices, Inc.

Inventor： Kishore Punniyamurthy , Khaled Hamidouche , Brandon K Potter , Rohit Shahaji Zambre

IPC: G06F9/54 , G06F9/50 , G06F15/173

CPC classification number: G06F9/54 , G06F9/5044 , G06F15/17356

Abstract: In accordance with described techniques for PE-centric all-to-all communication, a distributed computing system includes processing elements, such as graphics processing units, distributed in clusters. An all-to-all communication procedure is performed by the processing elements that are each configured to generate data packets in parallel for all-to-all data communication between the clusters. The all-to-all communication procedure includes a first stage of intra-cluster parallel data communication between respective processing elements of each of the clusters; a second stage of inter-cluster data exchange for all-to-all data communication between the clusters; and a third stage of intra-cluster data distribution to the respective processing elements of each of the clusters.

10.

发明公开
EFFICIENT MEMORY-SEMANTIC NETWORKING USING SCOPED MEMORY MODELS 审中-公开

公开(公告)号：US20230289070A1

公开(公告)日：2023-09-14

申请号：US18320819

申请日：2023-05-19

Applicant: Advanced Micro Devices, Inc.

Inventor： Michael W. LeBeane , Khaled Hamidouche , Hari S. Thangirala , Brandon Keith Potter

IPC: G06F3/06 , G06F12/02 , G06F12/0802

CPC classification number: G06F3/0619 , G06F12/0223 , G06F3/0656 , G06F3/067 , G06F12/0802 , G06F2212/152

Abstract: A framework disclosed herein extends a relaxed, scoped memory model to a system that includes nodes across a commodity network and maintains coherency across the system. A new scope, cluster scope, is defined, that allows for memory accesses at scopes less than cluster scope to operate on locally cached versions of remote data from across the commodity network without having to issue expensive network operations. Cluster scope operations generate network commands that are used to synchronize memory across the commodity network.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification