INSTRUCTIONS FOR REMOTE ATOMIC OPERATIONS
    1.
    发明公开

    公开(公告)号:US20240362021A1

    公开(公告)日:2024-10-31

    申请号:US18670427

    申请日:2024-05-21

    Abstract: Disclosed embodiments relate to atomic memory operations. In one example, a method of executing an instruction atomically and with weak order includes: fetching, by fetch circuitry, the instruction from code storage, the instruction including an opcode, a source identifier, and a destination identifier, decoding, by decode circuitry, the fetched instruction, selecting, by a scheduling circuit, an execution circuit among multiple circuits in a system, scheduling, by the scheduling circuit, execution of the decoded instruction out of order with respect to other instructions, with an order selected to optimize at least one of latency, throughput, power, and performance, and executing the decoded instruction, by the execution circuit, to: atomically read a datum from a location identified by the destination identifier, perform an operation on the datum as specified by the opcode, the operation to use a source operand identified by the source identifier, and write a result back to the location.

    Sharing aware snoop filter apparatus and method

    公开(公告)号:US09898408B2

    公开(公告)日:2018-02-20

    申请号:US15088921

    申请日:2016-04-01

    CPC classification number: G06F12/0831 G06F12/0811 G06F2212/283 G06F2212/621

    Abstract: An apparatus and method are described for a sharing aware snoop filter. For example, one embodiment of a processor comprises: a plurality of caches, each of the caches comprising a plurality of cache lines, at least some of which are to be shared by two or more of the caches; a snoop filter to monitor accesses to the plurality of cache lines shared by the two or more caches, the snoop filter comprising: a primary snoop filter comprising a first plurality of entries, each entry associated with one of the plurality of cache lines and comprising a N unique identifiers to uniquely identify up to N of the plurality of caches currently storing the cache line; an auxiliary snoop filter comprising a second plurality of entries, each entry associated with one of the plurality of cache lines, wherein once a particular cache line has been shared by more than N caches, an entry for that cache line is allocated in the auxiliary snoop filter to uniquely identify one or more additional caches storing the cache line.

    SPATIAL AND TEMPORAL MERGING OF REMOTE ATOMIC OPERATIONS

    公开(公告)号:US20190205139A1

    公开(公告)日:2019-07-04

    申请号:US15858899

    申请日:2017-12-29

    Abstract: Disclosed embodiments relate to spatial and temporal merging of remote atomic operations. In one example, a system includes an RAO instruction queue stored in a memory and having entries grouped by destination cache line, each entry to enqueue an RAO instruction including an opcode, a destination identifier, and source data, optimization circuitry to receive an incoming RAO instruction, scan the RAO instruction queue to detect a matching enqueued RAO instruction identifying a same destination cache line as the incoming RAO instruction, the optimization circuitry further to, responsive to no matching enqueued RAO instruction being detected, enqueue the incoming RAO instruction; and, responsive to a matching enqueued RAO instruction being detected, determine whether the incoming and matching RAO instructions have a same opcode to non-overlapping cache line elements, and, if so, spatially combine the incoming and matching RAO instructions by enqueuing both RAO instructions in a same group of cache line queue entries at different offsets.

    HARDWARE APPARATUSES AND METHODS TO CONTROL CACHE LINE COHERENCY
    7.
    发明申请
    HARDWARE APPARATUSES AND METHODS TO CONTROL CACHE LINE COHERENCY 有权
    硬件设备和控制高速缓存行的方法

    公开(公告)号:US20160092354A1

    公开(公告)日:2016-03-31

    申请号:US14498946

    申请日:2014-09-26

    Abstract: Methods and apparatuses to control cache line coherency are described. A processor may include a first core having a cache to store a cache line, a second core to send a request for the cache line from the first core, moving logic to cause a move of the cache line between the first core and a memory and to update a tag directory of the move, and cache line coherency logic to create a chain home in the tag directory from the request to cause the cache line to be sent from the tag directory to the second core. A method to control cache line coherency may include creating a chain home in a tag directory from a request for a cache line in a first processor core from a second processor core to cause the cache line to be sent from the tag directory to the second processor core.

    Abstract translation: 描述了控制高速缓存行一致性的方法和装置。 处理器可以包括具有高速缓存以存储高速缓存行的第一核心,从第一核心发送对高速缓存线路的请求的第二核心,移动逻辑以使高速缓存行在第一核心和存储器之间移动; 更新移动的标签目录,以及高速缓存行一致性逻辑,以从请求中在标签目录中创建链路归属,以使高速缓存行从标签目录发送到第二核心。 控制高速缓存行相关性的方法可以包括:从第二处理器核心的第一处理器核心中的对高速缓存行的请求创建标签目录中的链路归属,以使高速缓存行从标签目录发送到第二处理器 核心。

    Adaptive remote atomics
    8.
    发明授权

    公开(公告)号:US12216579B2

    公开(公告)日:2025-02-04

    申请号:US17134254

    申请日:2020-12-25

    Abstract: Disclosed embodiments relate to atomic memory operations. In one example, an apparatus includes multiple processor cores, a cache hierarchy, a local execution unit, and a remote execution unit, and an adaptive remote atomic operation unit. The cache hierarchy includes a local cache at a first level and a shared cache at a second level. The local execution unit is to perform an atomic operation at the first level if the local cache is a storing a cache line including data for the atomic operation. The remote execution unit is to perform the atomic operation at the second level. The adaptive remote atomic operation unit is to determine whether to perform the first atomic operation at the first level or at the second level and whether to copy the cache line from the shared cache to the local cache.

Patent Agency Ranking