PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS TO ATOMICALLY STORE TO MEMORY DATA WIDER THAN A NATIVELY SUPPORTED DATA WIDTH

    公开(公告)号:US20220405234A1

    公开(公告)日:2022-12-22

    申请号:US17827882

    申请日:2022-05-30

    Abstract: A processor includes a widest set of data registers that corresponds to a given logical processor. Each of the data registers of the widest set have a first width in bits. A decode unit that corresponds to the given logical processor is to decode instructions that specify the data registers of the widest set, and is to decode an atomic store to memory instruction. The atomic store to memory instruction is to indicate data that is to have a second width in bits that is wider than the first width in bits. The atomic store to memory instruction is to indicate memory address information associated with a memory location. An execution unit is coupled with the decode unit. The execution unit, in response to the atomic store to memory instruction, is to atomically store the indicated data to the memory location.

    SPATIAL AND TEMPORAL MERGING OF REMOTE ATOMIC OPERATIONS

    公开(公告)号:US20190205139A1

    公开(公告)日:2019-07-04

    申请号:US15858899

    申请日:2017-12-29

    Abstract: Disclosed embodiments relate to spatial and temporal merging of remote atomic operations. In one example, a system includes an RAO instruction queue stored in a memory and having entries grouped by destination cache line, each entry to enqueue an RAO instruction including an opcode, a destination identifier, and source data, optimization circuitry to receive an incoming RAO instruction, scan the RAO instruction queue to detect a matching enqueued RAO instruction identifying a same destination cache line as the incoming RAO instruction, the optimization circuitry further to, responsive to no matching enqueued RAO instruction being detected, enqueue the incoming RAO instruction; and, responsive to a matching enqueued RAO instruction being detected, determine whether the incoming and matching RAO instructions have a same opcode to non-overlapping cache line elements, and, if so, spatially combine the incoming and matching RAO instructions by enqueuing both RAO instructions in a same group of cache line queue entries at different offsets.

    SPATIAL AND TEMPORAL MERGING OF REMOTE ATOMIC OPERATIONS

    公开(公告)号:US20200319886A1

    公开(公告)日:2020-10-08

    申请号:US16799619

    申请日:2020-02-24

    Abstract: Disclosed embodiments relate to spatial and temporal merging of remote atomic operations. In one example, a system includes an RAO instruction queue stored in a memory and having entries grouped by destination cache line, each entry to enqueue an RAO instruction including an opcode, a destination identifier, and source data, optimization circuitry to receive an incoming RAO instruction, scan the RAO instruction queue to detect a matching enqueued RAO instruction identifying a same destination cache line as the incoming RAO instruction, the optimization circuitry further to, responsive to no matching enqueued RAO instruction being detected, enqueue the incoming RAO instruction; and, responsive to a matching enqueued RAO instruction being detected, determine whether the incoming and matching RAO instructions have a same opcode to non-overlapping cache line elements, and, if so, spatially combine the incoming and matching RAO instructions by enqueuing both RAO instructions in a same group of cache line queue entries at different offsets.

    Apparatuses, methods, and systems for memory disambiguation

    公开(公告)号:US10067762B2

    公开(公告)日:2018-09-04

    申请号:US15201218

    申请日:2016-07-01

    Abstract: Apparatuses, methods, and systems relating to memory disambiguation are described. In one embodiment, a processor includes a decoder to decode an instruction into a decoded instruction, an execution unit to execute the decoded instruction, a retirement unit to retire an executed instruction in program order, and a memory disambiguation circuit to allocate an entry in a memory disambiguation table for a first load instruction that is to be flushed for a memory ordering violation, the entry comprising a counter value and an instruction pointer for the first load instruction.

    COHERENT FABRIC INTERCONNECT FOR USE IN MULTIPLE TOPOLOGIES
    8.
    发明申请
    COHERENT FABRIC INTERCONNECT FOR USE IN MULTIPLE TOPOLOGIES 有权
    用于多种拓扑学的相容织物互连

    公开(公告)号:US20160378701A1

    公开(公告)日:2016-12-29

    申请号:US14751899

    申请日:2015-06-26

    Abstract: An apparatus having a fabric interconnect that supports multiple topologies and method for using the same are disclosed. In one embodiment, the apparatus comprises mode memory to store information indicative of one of the plurality of modes; and a first fabric operable in a plurality of modes, where the fabric comprises logic coupled to the mode memory to control processing of read and write requests to memory received by the first fabric according to the mode identified by the information indicative.

    Abstract translation: 公开了一种具有支撑多种拓扑结构的结构互连的装置及其使用方法。 在一个实施例中,该装置包括用于存储指示多个模式之一的信息的模式存储器; 以及可以多种模式操作的第一结构,其中所述结构包括耦合到所述模式存储器的逻辑,以根据由所述信息指示识别的模式来控制对由所述第一结构接收的存储器的读取和写入请求的处理。

    Spatial and temporal merging of remote atomic operations

    公开(公告)号:US11500636B2

    公开(公告)日:2022-11-15

    申请号:US16799619

    申请日:2020-02-24

    Abstract: Disclosed embodiments relate to spatial and temporal merging of remote atomic operations. In one example, a system includes an RAO instruction queue stored in a memory and having entries grouped by destination cache line, each entry to enqueue an RAO instruction including an opcode, a destination identifier, and source data, optimization circuitry to receive an incoming RAO instruction, scan the RAO instruction queue to detect a matching enqueued RAO instruction identifying a same destination cache line as the incoming RAO instruction, the optimization circuitry further to, responsive to no matching enqueued RAO instruction being detected, enqueue the incoming RAO instruction; and, responsive to a matching enqueued RAO instruction being detected, determine whether the incoming and matching RAO instructions have a same opcode to non-overlapping cache line elements, and, if so, spatially combine the incoming and matching RAO instructions by enqueuing both RAO instructions in a same group of cache line queue entries at different offsets.

    Processors, methods, systems, and instructions to atomically store to memory data wider than a natively supported data width

    公开(公告)号:US10901940B2

    公开(公告)日:2021-01-26

    申请号:US15089525

    申请日:2016-04-02

    Abstract: A processor includes a widest set of data registers that corresponds to a given logical processor. Each of the data registers of the widest set have a first width in bits. A decode unit that corresponds to the given logical processor is to decode instructions that specify the data registers of the widest set, and is to decode an atomic store to memory instruction. The atomic store to memory instruction is to indicate data that is to have a second width in bits that is wider than the first width in bits. The atomic store to memory instruction is to indicate memory address information associated with a memory location. An execution unit is coupled with the decode unit. The execution unit, in response to the atomic store to memory instruction, is to atomically store the indicated data to the memory location.

Patent Agency Ranking