-
公开(公告)号:US20200073801A1
公开(公告)日:2020-03-05
申请号:US16119438
申请日:2018-08-31
Applicant: Advanced Micro Devices, Inc.
Inventor: Vydhyanathan Kalyanasundharam , Kevin M. Lepak , Amit P. Apte , Ganesh Balakrishnan
IPC: G06F12/0817
Abstract: Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.
-
公开(公告)号:US10545875B2
公开(公告)日:2020-01-28
申请号:US15855838
申请日:2017-12-27
Applicant: Advanced Micro Devices, Inc.
Inventor: Vydhyanathan Kalyanasundharam , Kevin M. Lepak , Ganesh Balakrishnan , Ravindra N. Bhargava
IPC: G06F12/0897 , G06F12/121
Abstract: Systems, apparatuses, and methods for implementing a tag accelerator cache are disclosed. A system includes at least a data cache and a control unit coupled to the data cache via a memory controller. The control unit includes a tag accelerator cache (TAC) for caching tag blocks fetched from the data cache. The data cache is organized such that multiple tags are retrieved in a single access. This allows hiding the tag latency penalty for future accesses to neighboring tags and improves cache bandwidth. When a tag block is fetched from the data cache, the tag block is cached in the TAC. Memory requests received by the control unit first lookup the TAC before being forwarded to the data cache. Due to the presence of spatial locality in applications, the TAC can filter out a large percentage of tag accesses to the data cache, resulting in latency and bandwidth savings.
-
公开(公告)号:US11954033B1
公开(公告)日:2024-04-09
申请号:US17957823
申请日:2022-10-19
Applicant: Advanced Micro Devices, Inc.
Inventor: Ganesh Balakrishnan , Amit Apte , Ann Ling , Vydhyanathan Kalyanasundharam
IPC: G06F12/0815
CPC classification number: G06F12/0815
Abstract: A method includes, in a cache directory, storing an entry associating a memory region with an exclusive coherency state, and in response to a memory access directed to the memory region, transmitting a demote superprobe to convert at least one cache line of the memory region from an exclusive coherency state to a shared coherency state.
-
公开(公告)号:US11874783B2
公开(公告)日:2024-01-16
申请号:US17557639
申请日:2021-12-21
Applicant: Advanced Micro Devices, Inc.
Inventor: Vydhyanathan Kalyanasundharam , Amit P. Apte , Eric Christopher Morton , Ganesh Balakrishnan , Ann M. Ling
CPC classification number: G06F13/1673 , G06F3/061 , G06F3/0656 , G06F3/0658 , G06F3/0679 , G06F2213/0038
Abstract: A coherent memory fabric includes a plurality of coherent master controllers and a coherent slave controller. The plurality of coherent master controllers each include a response data buffer. The coherent slave controller is coupled to the plurality of coherent master controllers. The coherent slave controller, responsive to determining a selected coherent block read command is guaranteed to have only one data response, sends a target request globally ordered message to the selected coherent master controller and transmits responsive data. The selected coherent master controller, responsive to receiving the target request globally ordered message, blocks any coherent probes to an address associated with the selected coherent block read command until receipt of the responsive data is acknowledged by a requesting client.
-
公开(公告)号:US11803470B2
公开(公告)日:2023-10-31
申请号:US17130905
申请日:2020-12-22
Applicant: Advanced Micro Devices, Inc.
Inventor: Amit Apte , Ganesh Balakrishnan , Ann Ling , Vydhyanathan Kalyanasundharam
IPC: G06F12/0817
CPC classification number: G06F12/0828 , G06F2212/621
Abstract: Disclosed are examples of a system and method to communicate cache line eviction data from a CPU subsystem to a home node over a prioritized channel and to release the cache subsystem early to process other transactions.
-
公开(公告)号:US20230195662A1
公开(公告)日:2023-06-22
申请号:US17557639
申请日:2021-12-21
Applicant: Advanced Micro Devices, Inc.
Inventor: Vydhyanathan Kalyanasundharam , Amit P. Apte , Eric Christopher Morton , Ganesh Balakrishnan , Ann M. Ling
CPC classification number: G06F13/1673 , G06F3/0656 , G06F3/0658 , G06F3/061 , G06F3/0679 , G06F2213/0038
Abstract: A coherent memory fabric includes a plurality of coherent master controllers and a coherent slave controller. The plurality of coherent master controllers each include a response data buffer. The coherent slave controller is coupled to the plurality of coherent master controllers. The coherent slave controller, responsive to determining a selected coherent block read command is guaranteed to have only one data response, sends a target request globally ordered message to the selected coherent master controller and transmits responsive data. The selected coherent master controller, responsive to receiving the target request globally ordered message, blocks any coherent probes to an address associated with the selected coherent block read command until receipt of the responsive data is acknowledged by a requesting client.
-
公开(公告)号:US20230195632A1
公开(公告)日:2023-06-22
申请号:US17556649
申请日:2021-12-20
Applicant: Advanced Micro Devices, Inc.
IPC: G06F12/0817 , G06F12/0811 , G06F12/0891 , G06F13/16
CPC classification number: G06F12/0817 , G06F12/0811 , G06F12/0891 , G06F13/1668
Abstract: A data processing system includes a plurality of coherent masters, a plurality of coherent slaves, and a coherent data fabric. The coherent data fabric has upstream ports coupled to the plurality of coherent masters and downstream ports coupled to the plurality of coherent slaves for selectively routing accesses therebetween. The coherent data fabric includes a probe filter and a directory cleaner. The probe filter is associated with at least one of the downstream ports and has a plurality of entries that store information about each entry. The directory cleaner periodically scans the probe filter and selectively removes a first entry from the probe filter after the first entry is scanned.
-
公开(公告)号:US20220091991A1
公开(公告)日:2022-03-24
申请号:US17031834
申请日:2020-09-24
Applicant: Advanced Micro Devices, Inc.
Inventor: Ravindra N. Bhargava , Ganesh Balakrishnan , Joe Sargunaraj , Chintan S. Patel , Girish Balaiah Aswathaiya , Vydhyanathan Kalyanasundharam
IPC: G06F12/0891 , G06F12/0813 , G06F12/084 , G06F12/0831 , G06F9/46
Abstract: A method includes, in response to each write request of a plurality of write requests received at a memory-side cache device coupled with a memory device, writing payload data specified by the write request to the memory-side cache device, and when a first bandwidth availability condition is satisfied, performing a cache write-through by writing the payload data to the memory device, and recording an indication that the payload data written to the memory-side cache device matches the payload data written to the memory device.
-
公开(公告)号:US10776282B2
公开(公告)日:2020-09-15
申请号:US15844215
申请日:2017-12-15
Applicant: Advanced Micro Devices, Inc.
Inventor: Amit P. Apte , Ganesh Balakrishnan , Vydhyanathan Kalyanasundharam , Kevin M. Lepak
IPC: G06F12/128 , G06F12/0817 , G06F12/0831 , G06F12/0891
Abstract: Systems, apparatuses, and methods for implementing a speculative probe mechanism are disclosed. A system includes at least multiple processing nodes, a probe filter, and a coherent slave. The coherent slave includes an early probe cache to cache recent lookups to the probe filter. The early probe cache includes entries for regions of memory, wherein a region includes a plurality of cache lines. The coherent slave performs parallel lookups to the probe filter and the early probe cache responsive to receiving a memory request. An early probe is sent to a first processing node responsive to determining that a lookup to the early probe cache hits on a first entry identifying the first processing node as an owner of a first region targeted by the memory request and responsive to determining that a confidence indicator of the first entry is greater than a threshold.
-
-
-
-
-
-
-
-