-
公开(公告)号:US11914517B2
公开(公告)日:2024-02-27
申请号:US17094989
申请日:2020-11-11
Applicant: Advanced Micro Devices, Inc.
Inventor: Sergey Blagodurov , Marko Scrbak , Brandon K. Potter
IPC: G06F12/08 , G06F12/0877 , G06F12/0815
CPC classification number: G06F12/0877 , G06F12/0815 , G06F2212/621
Abstract: Methods and apparatus provide monitoring of memory access traffic in a data processing system by tracking, such as by data fabric hardware control logic, a number of cache line accesses to a page of memory associated with one or more memory devices, and producing spike indication data that indicates a spike in cache line accesses to a given page of memory. Pages are moved from a slower memory to a faster memory based on the spike indication data. In some implementations, the tracking is done by updating a cache directory with data representing the tracked number of cache line accesses.
-
公开(公告)号:US11922207B2
公开(公告)日:2024-03-05
申请号:US16993150
申请日:2020-08-13
Applicant: Advanced Micro Devices, Inc.
Inventor: Michael W. LeBeane , Khaled Hamidouche , Brandon K. Potter
CPC classification number: G06F9/48 , G06F9/3836 , G06F9/3887 , G06F9/54 , H04L67/10 , G06T1/20
Abstract: An approach is provided for coalescing network commands in a GPU that implements a SIMT architecture. Compatible next network operations from different threads are coalesced into a single network command packet. This reduces the number of network command packets generated and issued by threads, thereby increasing efficiency, and improving throughput. The approach is applicable to any number of threads and any thread organization methodology, such as wavefronts, warps, etc.
-
公开(公告)号:US20240005126A1
公开(公告)日:2024-01-04
申请号:US17853670
申请日:2022-06-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Kishore Punniyamurthy , Khaled Hamidouche , Brandon K. Potter , Rohit Shahaji Zambre
Abstract: An electronic device includes one or more data producing nodes and a data consuming node. Each data producing node separately generates two or more portions of a respective block of data. Upon completing generating each portion of the two or more portions of the respective block of data, each data producing node communicates that portion of the respective block of data to the data consuming node. Upon receiving corresponding portions of the respective blocks of data from each of the one or more data producing nodes, the data consuming node performs operations for a model using the corresponding portions of the respective blocks of data.
-
公开(公告)号:US20220100668A1
公开(公告)日:2022-03-31
申请号:US17094989
申请日:2020-11-11
Applicant: Advanced Micro Devices, Inc.
Inventor: Sergey Blagodurov , Marko Scrbak , Brandon K. Potter
IPC: G06F12/0877 , G06F12/0815
Abstract: Methods and apparatus provide monitoring of memory access traffic in a data processing system by tracking, such as by data fabric hardware control logic, a number of cache line accesses to a page of memory associated with one or more memory devices, and producing spike indication data that indicates a spike in cache line accesses to a given page of memory. Pages are moved from a slower memory to a faster memory based on the spike indication data. In some implementations, the tracking is done by updating a cache directory with data representing the tracked number of cache line accesses.
-
公开(公告)号:US12271318B2
公开(公告)日:2025-04-08
申请号:US17135657
申请日:2020-12-28
Applicant: Advanced Micro Devices, Inc.
Inventor: Brandon K. Potter , Marko Scrbak , Sergey Blagodurov , Kishore Punniyamurthy , Nathaniel Morris
IPC: G06F12/12 , G06F12/0817
Abstract: Method and apparatus monitor eviction conflicts among cache directory entries in a cache directory and produce cache directory victim entry information for a memory manager. In some examples, the memory manager reduces future cache directory conflicts by changing a page level physical address assignment for a page of memory based on the produced cache directory victim entry information. In some examples, a scalable data fabric includes hardware control logic that performs the monitoring of the eviction conflicts among cache directory entries in the cache directory and produces the cache directory victim entry information.
-
公开(公告)号:US20250110899A1
公开(公告)日:2025-04-03
申请号:US18478659
申请日:2023-09-29
Applicant: Advanced Micro Devices, Inc.
Inventor: Kishore Punniyamurthy , Khaled Hamidouche , Brandon K. Potter
Abstract: An apparatus and method for reducing the memory bandwidth of executing machine learning models. A computing system includes two or more processing nodes, each including at least one or more processors and a corresponding local memory. Switch circuitry communicates with at least the local memories and a system memory of the computing system. The switch includes multiple direct memory access (DMA) interfaces. Each of one or more processing nodes stores multiple embedding rows of embedding tables. A processor of the processing node identifies two or more embedding rows as source operands of a reduction operation. The switch executes memory access requests to retrieve data of the two or more embedding rows from the corresponding local memory, and generates a result by performing the reduction operation. The switch sends the result to the local memory.
-
公开(公告)号:US11656796B2
公开(公告)日:2023-05-23
申请号:US17219505
申请日:2021-03-31
Applicant: Advanced Micro Devices, Inc.
Inventor: Sergey Blagodurov , Brandon K. Potter , Johnathan Alsop
CPC classification number: G06F3/0659 , G06F3/067 , G06F3/0658 , G06F9/30087 , G06F9/3838 , G06F3/0604
Abstract: A data processor includes a fabric-attached memory (FAM) interface for coupling to a data fabric and fulfilling memory access instructions. A requestor-side adaptive consistency controller coupled to the FAM interface requests notifications from a fabric manager for the fabric-attached memory regarding changes in requestors authorized to access a FAM region which the data processor is authorized to access. If a notification indicates that more than one requestor is authorized to access the FAM region, fences are activated for selected memory access instructions in a local application.
-
公开(公告)号:US20220317927A1
公开(公告)日:2022-10-06
申请号:US17219505
申请日:2021-03-31
Applicant: Advanced Micro Devices, Inc.
Inventor: Sergey Blagodurov , Brandon K. Potter , Johnathan Alsop
Abstract: A data processor includes a fabric-attached memory (FAM) interface for coupling to a data fabric and fulfilling memory access instructions. A requestor-side adaptive consistency controller coupled to the FAM interface requests notifications from a fabric manager for the fabric-attached memory regarding changes in requestors authorized to access a FAM region which the data processor is authorized to access. If a notification indicates that more than one requestor is authorized to access the FAM region, fences are activated for selected memory access instructions in a local application.
-
公开(公告)号:US20220206946A1
公开(公告)日:2022-06-30
申请号:US17135657
申请日:2020-12-28
Applicant: Advanced Micro Devices, Inc.
Inventor: Brandon K. Potter , Marko Scrbak , Sergey Blagodurov , Kishore Punniyamurthy , Nathaniel Morris
IPC: G06F12/0817
Abstract: Method and apparatus monitor eviction conflicts among cache directory entries in a cache directory and produce cache directory victim entry information for a memory manager. In some examples, the memory manager reduces future cache directory conflicts by changing a page level physical address assignment for a page of memory based on the produced cache directory victim entry information. In some examples, a scalable data fabric includes hardware control logic that performs the monitoring of the eviction conflicts among cache directory entries in the cache directory and produces the cache directory victim entry information.
-
公开(公告)号:US11030117B2
公开(公告)日:2021-06-08
申请号:US15650252
申请日:2017-07-14
Applicant: Advanced Micro Devices, Inc.
Inventor: Nuwan Jayasena , Brandon K. Potter , Andrew G. Kegel
Abstract: A host processor receives an address translation request from an accelerator, which may be trusted or un-trusted. The address translation request includes a virtual address in a virtual address space that is shared by the host processor and the accelerator. The host processor encrypts a physical address in a host memory indicated by the virtual address in response to the accelerator being permitted to access the physical address. The host processor then provides the encrypted physical address to the accelerator. The accelerator provides memory access requests including the encrypted physical address to the host processor, which decrypts the physical address and selectively accesses a location in the host memory indicated by the decrypted physical address depending upon whether the accelerator is permitted to access the location indicated by the decrypted physical address.
-
-
-
-
-
-
-
-
-