GATHER USING INDEX ARRAY AND FINITE STATE MACHINE

    公开(公告)号:US20170192934A1

    公开(公告)日:2017-07-06

    申请号:US14616323

    申请日:2015-02-06

    Abstract: Methods and apparatus are disclosed for using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode a scatter/gather instruction and generate a set of micro-operations, and an index array to hold a set of indices and a corresponding set of mask elements. A finite state machine facilitates the gather operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. An address is accessed to load a corresponding data element if the mask element had the first value. The data element is written at an in-register position in a destination vector register according to a respective in-register position the index. Values of corresponding mask elements are changed from the first value to a second value responsive to completion of their respective loads.

    SCATTER USING INDEX ARRAY AND FINITE STATE MACHINE
    23.
    发明申请
    SCATTER USING INDEX ARRAY AND FINITE STATE MACHINE 有权
    散射器使用索引阵列和有限状态机

    公开(公告)号:US20150074373A1

    公开(公告)日:2015-03-12

    申请号:US13977727

    申请日:2012-06-02

    Abstract: Methods and apparatus are disclosed using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode scatter/gather instructions and generate micro-operations. An index array holds a set of indices and a corresponding set of mask elements. A finite state machine facilitates the scatter operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. Storage is allocated in a buffer for each of the set of addresses being generated. Data elements corresponding to the set of addresses being generated are copied to the buffer. Addresses from the set are accessed to store data elements if a corresponding mask element has said first value and the mask element is changed to a second value responsive to completion of their respective stores.

    Abstract translation: 公开了使用索引阵列和有限状态机进行散射/收集操作的方法和装置。 设备的实施例可以包括:解码逻辑以解码散射/收集指令并产生微操作。 索引数组保存一组索引和一组对应的掩码元素。 有限状态机有助于散射操作。 地址生成逻辑从针对具有第一值的对应掩模元素中的至少每一个的索引集合的索引生成地址。 正在生成的每组地址的缓冲区中分配存储空间。 与生成的地址集相对应的数据元素被复制到缓冲器。 如果对应的掩码元素具有所述第一值并且掩模元素被响应于它们各自的存储的完成而被改变为第二值,则访问该集合的地址以存储数据元素。

    Apparatuses, methods, and systems to precisely monitor memory store accesses

    公开(公告)号:US11915000B2

    公开(公告)日:2024-02-27

    申请号:US18160600

    申请日:2023-01-27

    Abstract: Systems, methods, and apparatuses relating to circuitry to precisely monitor memory store accesses are described. In one embodiment, a system includes a memory, a hardware processor core comprising a decoder to decode an instruction into a decoded instruction, an execution circuit to execute the decoded instruction to produce a resultant, a store buffer, and a retirement circuit to retire the instruction when a store request for the resultant from the execution circuit is queued into the store buffer for storage into the memory, and a performance monitoring circuit to mark the retired instruction for monitoring of post-retirement performance information between being queued in the store buffer and being stored in the memory, enable a store fence after the retired instruction to be inserted that causes previous store requests to complete within the memory, and on detection of completion of the store request for the instruction in the memory, store the post-retirement performance information in storage of the performance monitoring circuit.

    Accelerator systems and methods for matrix operations

    公开(公告)号:US10942738B2

    公开(公告)日:2021-03-09

    申请号:US16368973

    申请日:2019-03-29

    Abstract: The present disclosure is directed to systems and methods for performing one or more operations on a two dimensional tile register using an accelerator that includes a tiled matrix multiplication unit (TMU). The processor circuitry includes reservation station (RS) circuitry to communicatively couple the processor circuitry to the TMU. The RS circuitry coordinates the operations performed by the TMU. TMU dispatch queue (TDQ) circuitry in the TMU maintains the operations received from the RS circuitry in the order that the operations are received from the RS circuitry. Since the duration of each operation is not known prior to execution by the TMU, the RS circuitry maintains shadow dispatch queue (RS-TDQ) circuitry that mirrors the operations in the TDQ circuitry. Communication between the RS circuitry 134 and the TMU provides the RS circuitry with notification of successfully executed operations and allows the RS circuitry to cancel operations where the operations are associated with branch mispredictions and/or non-retired speculatively executed instructions.

Patent Agency Ranking