Spatial and temporal merging of remote atomic operations

    公开(公告)号:US10572260B2

    公开(公告)日:2020-02-25

    申请号:US15858899

    申请日:2017-12-29

    Abstract: Disclosed embodiments relate to spatial and temporal merging of remote atomic operations. In one example, a system includes an RAO instruction queue stored in a memory and having entries grouped by destination cache line, each entry to enqueue an RAO instruction including an opcode, a destination identifier, and source data, optimization circuitry to receive an incoming RAO instruction, scan the RAO instruction queue to detect a matching enqueued RAO instruction identifying a same destination cache line as the incoming RAO instruction, the optimization circuitry further to, responsive to no matching enqueued RAO instruction being detected, enqueue the incoming RAO instruction; and, responsive to a matching enqueued RAO instruction being detected, determine whether the incoming and matching RAO instructions have a same opcode to non-overlapping cache line elements, and, if so, spatially combine the incoming and matching RAO instructions by enqueuing both RAO instructions in a same group of cache line queue entries at different offsets.

    Remote atomic operations in multi-socket systems

    公开(公告)号:US10296459B1

    公开(公告)日:2019-05-21

    申请号:US15858894

    申请日:2017-12-29

    Abstract: Disclosed embodiments relate to remote atomic operations (RAO) in multi-socket systems. In one example, a method, performed by a cache control circuit of a requester socket, includes: receiving the RAO instruction from the requester CPU core, determining a home agent in a home socket for the addressed cache line, providing a request for ownership (RFO) of the addressed cache line to the home agent, waiting for the home agent to either invalidate and retrieve a latest copy of the addressed cache line from a cache, or to fetch the addressed cache line from memory, receiving an acknowledgement and the addressed cache line, executing the RAO instruction on the received cache line atomically, subsequently receiving multiple local RAO instructions to the addressed cache line from one or more requester CPU cores, and executing the multiple local RAO instructions on the received cache line independently of the home agent.

    Method and apparatus for store durability and ordering in a persistent memory architecture
    58.
    发明授权
    Method and apparatus for store durability and ordering in a persistent memory architecture 有权
    用于在持久存储器架构中存储耐久性和排序的方法和装置

    公开(公告)号:US09423959B2

    公开(公告)日:2016-08-23

    申请号:US13931875

    申请日:2013-06-29

    CPC classification number: G06F3/0604 G06F3/0659 G06F3/0671 G06F13/1668

    Abstract: An apparatus and method are described for store durability and ordering in a persistent memory architecture. For example, one embodiment of a method comprises: performing at least one store operation to one or more addresses identifying at least one persistent memory device, the store operations causing one or more memory controllers to store data in the at least one persistent memory device; sending a request message to the one or more memory controllers instructing the memory controllers to confirm that the store operations are successfully committed to the at least one persistent memory device; ensuring at the one or more memory controllers that at least all pending store operations received at the time of the request message will be committed to the persistent memory device; and sending a response message from the one or more memory controllers indicating that the store operations are successfully committed to the persistent memory device.

    Abstract translation: 描述了用于在持久存储器架构中的存储耐久性和排序的装置和方法。 例如,方法的一个实施例包括:对识别至少一个持久存储器设备的一个或多个地址执行至少一个存储操作,所述存储操作使一个或多个存储器控制器将数据存储在所述至少一个持久存储器设备中; 向所述一个或多个存储器控制器发送请求消息,指示所述存储器控制器确认所述存储操作被成功地提交给所述至少一个持久存储器设备; 确保在所述一个或多个存储器控制器处,至少在请求消息时接收到的所有未决存储操作将被提交给持久存储器设备; 以及从所述一个或多个存储器控制器发送指示所述存储操作被成功地提交给所述持久存储器设备的响应消息。

    Inclusive and non-inclusive tracking of local cache lines to avoid near memory reads on cache line memory writes into a two level system memory
    59.
    发明授权
    Inclusive and non-inclusive tracking of local cache lines to avoid near memory reads on cache line memory writes into a two level system memory 有权
    本地缓存行的包含和非包容性跟踪,以避免缓存行内存上的近似存储器读取写入两级系统内存

    公开(公告)号:US09418009B2

    公开(公告)日:2016-08-16

    申请号:US14142045

    申请日:2013-12-27

    CPC classification number: G06F12/0811 G06F12/0888

    Abstract: A processor may include a memory controller to interface with a system memory having a near memory and a far memory. The processor may include logic circuitry to cause memory controller to determine whether a write request is generated remotely or locally, and when the write request is generated remotely to instruct the memory controller to perform a read of near memory before performing a write, when the write request is generated locally and a cache line targeted by the write request is in the inclusive state to instruct the memory controller to perform the write without performing a read of near memory, and when the write request is generated locally and the cache line targeted by the write request is in the non-inclusive state to instruct the memory controller to read near memory before performing the write.

    Abstract translation: 处理器可以包括与具有近存储器和远存储器的系统存储器接口的存储器控​​制器。 处理器可以包括逻辑电路,以使存储器控制器确定写入请求是远程生成还是本地生成,并且当写入请求被远程生成以指示存储器控制器在执行写入之前执行近似存储器的读取,当写入 请求在本地生成,并且由写入请求所针对的高速缓存行处于包含状态,以指示存储器控制器执行写入而不执行近似存储器的读取,并且当本地生成写入请求时, 写请求处于非包容状态,以指示存储器控制器在执行写操作之前读取存储器。

    Virtual Shared Cache Mechanism in a Processing Device
    60.
    发明申请
    Virtual Shared Cache Mechanism in a Processing Device 有权
    处理设备中的虚拟共享缓存机制

    公开(公告)号:US20160077970A1

    公开(公告)日:2016-03-17

    申请号:US14484642

    申请日:2014-09-12

    Abstract: In accordance with embodiments disclosed herein, there is provided systems and methods for providing a virtual shared cache mechanism. A processing device includes a plurality of clusters allocated into a virtual private shared cache. Each of the clusters includes a plurality of cores and a plurality of cache slices co-located within the plurality of cores. The processing device also includes a virtual shared cache including the plurality of clusters such that the cache data in the plurality of cache slices is shared among the plurality of clusters.

    Abstract translation: 根据本文公开的实施例,提供了用于提供虚拟共享高速缓存机制的系统和方法。 处理设备包括分配到虚拟专用共享高速缓存中的多个群集。 每个群集包括多个核和多个高速缓存片,共同定位在多个核中。 处理装置还包括包含多个群集的虚拟共享高速缓存,使得多个高速缓存片段中的高速缓存数据在多个群集之间共享。

Patent Agency Ranking