-
公开(公告)号:US11868254B2
公开(公告)日:2024-01-09
申请号:US17491478
申请日:2021-09-30
发明人: Nuwan Jayasena
IPC分类号: G06F12/08 , G06F12/0802
CPC分类号: G06F12/0802 , G06F2212/60
摘要: An electronic device includes a cache, a memory, and a controller. The controller stores an epoch counter value in metadata for a location in the memory when a cache block evicted from the cache is stored in the location. The controller also controls how the cache block is retained in the cache based at least in part on the epoch counter value when the cache block is subsequently retrieved from the location and stored in the cache.
-
公开(公告)号:US11726918B2
公开(公告)日:2023-08-15
申请号:US17361145
申请日:2021-06-28
发明人: Johnathan Alsop , Alexandru Dutu , Shaizeen Aga , Nuwan Jayasena
IPC分类号: G06F12/0871 , G06F12/02 , G06F12/084 , G06F12/0846
CPC分类号: G06F12/0871 , G06F12/0238 , G06F12/084 , G06F12/0846
摘要: Dynamically coalescing atomic memory operations for memory-local computing is disclosed. In an embodiment, it is determined whether a first atomic memory access and a second atomic memory access are candidates for coalescing. In response to a triggering event, the atomic memory accesses that are candidates for coalescing are coalesced in a cache prior to requesting memory-local processing by a memory-local compute unit. The atomic memory accesses may be coalesced in the same cache line or atomic memory accesses in different cache lines may be coalesced using a multicast memory-local processing command.
-
公开(公告)号:US11640444B2
公开(公告)日:2023-05-02
申请号:US17208526
申请日:2021-03-22
摘要: A processing device is provided which comprises memory configured to store data and a plurality of processor cores in communication with each other via first and second hierarchical communication links. Processor cores of a first hierarchical processor core group are in communication with each other via the first hierarchical communication links and are configured to store, in the memory, a sub-portion of data of a first matrix and a sub-portion of data of a second matrix. The processor cores are also configured to determine a product of the sub-portion of data of the first matrix and the sub-portion of data of the second matrix, receive, from another processor core, another sub-portion of data of the second matrix and determine a product of the sub-portion of data of the first matrix and the other sub-portion of data of the second matrix.
-
24.
公开(公告)号:US20220276795A1
公开(公告)日:2022-09-01
申请号:US17745278
申请日:2022-05-16
摘要: Approaches are provided for implementing hardware-software collaborative address mapping schemes that enable mapping data elements which are accessed together in the same row of one bank or over the same rows of different banks to achieve higher performance by reducing row conflicts. Using an intra-bank frame striping policy (IBFS), corresponding subsets of data elements are interleaved into a single row of a bank. Using an intra-channel frame striping policy (ICFS), corresponding subsets of data elements are interleaved into a single channel row of a channel. A memory controller utilizes ICFS and/or IBFS to efficiently store and access data elements in memory, such as processing-in-memory (PIM) enabled memory.
-
公开(公告)号:US11157174B2
公开(公告)日:2021-10-26
申请号:US16659559
申请日:2019-10-21
IPC分类号: G06F12/128 , G06F3/06 , G06F16/22
摘要: A hybrid mechanism for operating on a data item in connection with an associative structure combines first-fit and K-choice. The hybrid mechanism leverages advantages of both approaches by choosing whether to insert, retrieve, delete, or modify a data item using either first-fit or K-choice. Based on the data item, a function of the data item, and/or other factors such as the load statistics of the associative structure, one of either first-fit or K-choice is used to improve operation on the associative structure across a variety of different load states of the associative structure.
-
公开(公告)号:US11099788B2
公开(公告)日:2021-08-24
申请号:US16658733
申请日:2019-10-21
发明人: Nuwan Jayasena , Shaizeen Aga
摘要: An approach is provided for implementing near-memory data reduction during store operations to off-chip or off-die memory. A Near-Memory Reduction (NMR) unit provides near-memory data reduction during write operations to a specified address range. The NMR unit is configured with a range of addresses to be reduced and when a store operation specifies an address within the range of addresses, the NRM unit performs data reduction by adding the data value specified by the store operation to an accumulated reduction result. According to an embodiment, the NRM unit maintains a count of the number of updates to the accumulated reduction result that are used to determine when data reduction has been completed.
-
公开(公告)号:US20210117100A1
公开(公告)日:2021-04-22
申请号:US16659559
申请日:2019-10-21
摘要: A hybrid mechanism for operating on a data item in connection with an associative structure combines first-fit and K-choice. The hybrid mechanism leverages advantages of both approaches by choosing whether to insert, retrieve, delete, or modify a data item using either first-fit or K-choice. Based on the data item, a function of the data item, and/or other factors such as the load statistics of the associative structure, one of either first-fit or K-choice is used to improve operation on the associative structure across a variety of different load states of the associative structure.
-
公开(公告)号:US10853904B2
公开(公告)日:2020-12-01
申请号:US15079543
申请日:2016-03-24
发明人: Yasuko Eckert , Nuwan Jayasena
摘要: A processor employs a hierarchical register file for a graphics processing unit (GPU). A top level of the hierarchical register file is stored at a local memory of the GPU (e.g., a memory on the same integrated circuit die as the GPU). Lower levels of the hierarchical register file are stored at a different, larger memory, such as a remote memory located on a different die than the GPU. A register file control module monitors the status of in-flight wavefronts at the GPU, and in particular whether each in-flight wavefront is active, predicted to be become active, or inactive. The register file control module places execution data for active and predicted-active wavefronts in the top level of the hierarchical register file and places execution data for inactive wavefronts at lower levels of the hierarchical register file.
-
公开(公告)号:US20190317832A1
公开(公告)日:2019-10-17
申请号:US15952143
申请日:2018-04-12
IPC分类号: G06F9/52
摘要: A thread holding a lock notifies a sleeping thread that is waiting on the lock that the lock holding thread is “about” to release the lock. In response to the notification, the waiting thread is woken up. While the waiting thread is woken up, the lock holding thread completes other operations prior to actually releasing the lock and then releases the lock. The notification to the waiting thread hides latency associated with waking up the waiting thread by allowing operations that wake up the waiting thread to occur while the lock holding thread is performing the other operations prior to releasing the thread.
-
公开(公告)号:US20190317831A1
公开(公告)日:2019-10-17
申请号:US15952149
申请日:2018-04-12
IPC分类号: G06F9/52
摘要: A memory fence or other similar operation is executed with reduced latency. An early fence operation is executed and acts as a hint to the processor executing the thread that executes the fence. This hint causes the processor to begin performing sub-operations for the fence earlier than if no such hint were executed. Examples of sub-operations for the fence include operations to make data written to by writes prior to the fence operation available to other threads. A resolving fence, which occurs after the early fence, performs the remaining sub-operations for the fence. By triggering some or all of the sub-operations for a memory fence that will occur in the future, the early fence operation reduces the amount of latency associated with that memory fence operation.
-
-
-
-
-
-
-
-
-