-
1.
公开(公告)号:US20190258924A1
公开(公告)日:2019-08-22
申请号:US15898433
申请日:2018-02-17
Applicant: Advanced Micro Devices, Inc.
Inventor: Khaled Hamidouche , Michael W LeBeane , Walter B Benton , Michael L Chu
IPC: G06N3/08 , G06F15/173
Abstract: A method of training a neural network includes, at a local computing node, receiving remote parameters from a set of one or more remote computing nodes, initiating execution of a forward pass in a local neural network in the local computing node to determine a final output based on the remote parameters, initiating execution of a backward pass in the local neural network to determine updated parameters for the local neural network, and prior to completion of the backward pass, transmitting a subset of the updated parameters to the set of remote computing nodes.
-
公开(公告)号:US12086447B2
公开(公告)日:2024-09-10
申请号:US16719076
申请日:2019-12-18
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Khaled Hamidouche , Michael W. Lebeane , Hari S. Thangirala
IPC: G06F3/06 , G06F12/0882
CPC classification number: G06F3/0647 , G06F3/0611 , G06F3/0659 , G06F3/0688 , G06F12/0882 , G06F2212/7201
Abstract: A processing system includes a first processor couplable to a first memory and a second memory. In response to a page migration trigger for a page in the first memory, the first processor is configured to, responsive to the page being a read-only page storing code for execution, initiate migration of the page to a code cache portion of a second memory associated with a second processor and shared by multiple processes executing at the second processor, and to configure each process of a set of processes executing at the second processor to access and execute the code from the code cache portion.
-
公开(公告)号:US20220100391A1
公开(公告)日:2022-03-31
申请号:US17033170
申请日:2020-09-25
Applicant: Advanced Micro Devices, Inc.
Inventor: Michael W. LeBeane , Khaled Hamidouche , Hari S. Thangirala , Brandon Keith Potter
IPC: G06F3/06 , G06F12/02 , G06F12/0802
Abstract: A framework disclosed herein extends a relaxed, scoped memory model to a system that includes nodes across a commodity network and maintains coherency across the system. A new scope, cluster scope, is defined, that allows for memory accesses at scopes less than cluster scope to operate on locally cached versions of remote data from across the commodity network without having to issue expensive network operations. Cluster scope operations generate network commands that are used to synchronize memory across the commodity network.
-
公开(公告)号:US20200034195A1
公开(公告)日:2020-01-30
申请号:US16049216
申请日:2018-07-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Michael W. LeBeane , Khaled Hamidouche , Bradford M. Beckmann
Abstract: Techniques for improved networking performance in systems where a graphics processing unit or other highly parallel non-central-processing-unit (referred to as an accelerated processing device or “APD” herein) has the ability to directly issue commands to a networking device such as a network interface controller (“NIC”) are disclosed. According to a first technique, the latency associated with loading certain metadata into NIC hardware memory is reduced or eliminated by pre-fetching network command queue metadata into hardware network command queue metadata slots of the NIC, thereby reducing the latency associated with fetching that metadata at a later time. A second technique involves reducing latency by prioritizing work on an APD when it is known that certain network traffic is soon to arrive over the network via a NIC.
-
公开(公告)号:US20250077409A1
公开(公告)日:2025-03-06
申请号:US18240640
申请日:2023-08-31
Applicant: Advanced Micro Devices, Inc , ATI Technologies ULC
Inventor: Kishore Punniyamurthy , Richard David Sodke , Furkan Eris , Sergey Blagodurov , Bradford Michael Beckmann , Brandon Keith Potter , Khaled Hamidouche
Abstract: A device includes a plurality of processing elements (PEs). A symmetric memory is allocated in each of the plurality of PEs. The device includes a switch connected to the plurality of PEs. The switch is to: receive, from a first processing element (PE) of the plurality of PEs, a message that includes a buffer offset, compute, based on the buffer offset, a first memory address of a first buffer in a first symmetric memory of the first PE and a second memory address of a second buffer in a second symmetric memory of a second PE of the plurality of PEs, and initiate, based on the first memory address and the second memory address, a direct memory access operation to access the first buffer and the second buffer.
-
公开(公告)号:US12086422B2
公开(公告)日:2024-09-10
申请号:US18320819
申请日:2023-05-19
Applicant: Advanced Micro Devices, Inc.
Inventor: Michael W. LeBeane , Khaled Hamidouche , Hari S. Thangirala , Brandon Keith Potter
IPC: G06F3/06 , G06F12/02 , G06F12/0802
CPC classification number: G06F3/0619 , G06F3/0656 , G06F3/067 , G06F12/0223 , G06F12/0802 , G06F2212/152
Abstract: A framework disclosed herein extends a relaxed, scoped memory model to a system that includes nodes across a commodity network and maintains coherency across the system. A new scope, cluster scope, is defined, that allows for memory accesses at scopes less than cluster scope to operate on locally cached versions of remote data from across the commodity network without having to issue expensive network operations. Cluster scope operations generate network commands that are used to synchronize memory across the commodity network.
-
公开(公告)号:US20240220336A1
公开(公告)日:2024-07-04
申请号:US18147081
申请日:2022-12-28
Applicant: Advanced Micro Devices, Inc.
Inventor: Kishore Punniyamurthy , Khaled Hamidouche , Brandon K Potter , Rohit Shahaji Zambre
IPC: G06F9/54 , G06F9/50 , G06F15/173
CPC classification number: G06F9/54 , G06F9/5044 , G06F15/17356
Abstract: In accordance with described techniques for PE-centric all-to-all communication, a distributed computing system includes processing elements, such as graphics processing units, distributed in clusters. An all-to-all communication procedure is performed by the processing elements that are each configured to generate data packets in parallel for all-to-all data communication between the clusters. The all-to-all communication procedure includes a first stage of intra-cluster parallel data communication between respective processing elements of each of the clusters; a second stage of inter-cluster data exchange for all-to-all data communication between the clusters; and a third stage of intra-cluster data distribution to the respective processing elements of each of the clusters.
-
公开(公告)号:US20230289070A1
公开(公告)日:2023-09-14
申请号:US18320819
申请日:2023-05-19
Applicant: Advanced Micro Devices, Inc.
Inventor: Michael W. LeBeane , Khaled Hamidouche , Hari S. Thangirala , Brandon Keith Potter
IPC: G06F3/06 , G06F12/02 , G06F12/0802
CPC classification number: G06F3/0619 , G06F12/0223 , G06F3/0656 , G06F3/067 , G06F12/0802 , G06F2212/152
Abstract: A framework disclosed herein extends a relaxed, scoped memory model to a system that includes nodes across a commodity network and maintains coherency across the system. A new scope, cluster scope, is defined, that allows for memory accesses at scopes less than cluster scope to operate on locally cached versions of remote data from across the commodity network without having to issue expensive network operations. Cluster scope operations generate network commands that are used to synchronize memory across the commodity network.
-
公开(公告)号:US11714559B2
公开(公告)日:2023-08-01
申请号:US17033170
申请日:2020-09-25
Applicant: Advanced Micro Devices, Inc.
Inventor: Michael W. LeBeane , Khaled Hamidouche , Hari S. Thangirala , Brandon Keith Potter
IPC: G06F3/06 , G06F12/02 , G06F12/0802
CPC classification number: G06F3/0619 , G06F3/067 , G06F3/0656 , G06F12/0223 , G06F12/0802 , G06F2212/152
Abstract: A framework disclosed herein extends a relaxed, scoped memory model to a system that includes nodes across a commodity network and maintains coherency across the system. A new scope, cluster scope, is defined, that allows for memory accesses at scopes less than cluster scope to operate on locally cached versions of remote data from across the commodity network without having to issue expensive network operations. Cluster scope operations generate network commands that are used to synchronize memory across the commodity network.
-
10.
公开(公告)号:US11630994B2
公开(公告)日:2023-04-18
申请号:US15898433
申请日:2018-02-17
Applicant: Advanced Micro Devices, Inc.
Inventor: Khaled Hamidouche , Michael W LeBeane , Walter B Benton , Michael L Chu
IPC: G06N3/08 , G06F15/173 , G06N3/084 , G06N3/063 , G06N3/045
Abstract: A method of training a neural network includes, at a local computing node, receiving remote parameters from a set of one or more remote computing nodes, initiating execution of a forward pass in a local neural network in the local computing node to determine a final output based on the remote parameters, initiating execution of a backward pass in the local neural network to determine updated parameters for the local neural network, and prior to completion of the backward pass, transmitting a subset of the updated parameters to the set of remote computing nodes.
-
-
-
-
-
-
-
-
-