SYSTEMS, METHODS, AND APPARATUSES FOR TILE MATRIX MULTIPLICATION AND ACCUMULATION

    公开(公告)号:EP4216057A1

    公开(公告)日:2023-07-26

    申请号:EP23161367.0

    申请日:2017-07-01

    申请人: INTEL Corporation

    IPC分类号: G06F9/30

    摘要: Embodiments detailed herein relate to matrix operations. For example, in some embodiments, an apparatus comprises an instruction decoder to decode a single instruction, the single instruction having fields to indicate an opcode, a first register to store a first source matrix, a second register to store a second source matrix, and a third register to store a 2 by 2 third source matrix, wherein the opcode is to indicate a matrix multiply-accumulate operation; and execution circuitry to perform the matrix multiply-accumulate operation. The matrix multiply-accumulate operation includes: multiplying a value corresponding to a first row and a first column of the first source matrix and a value corresponding to a first row and a first column of the second source matrix to generate a first product, multiplying a value corresponding to the first row and a second column of the first source matrix and a value corresponding to a second row and the first column of the second source matrix to generate a second product, summing the first product, the second product, and an initial value corresponding to an element position in a first row and a first column of the 2 by 2 third source matrix to generate a resulting value corresponding to the element position in a destination matrix, and storing the destination matrix in the third register.

    REGION AWARE DELTA PREFETCHER
    3.
    发明公开

    公开(公告)号:EP4202695A1

    公开(公告)日:2023-06-28

    申请号:EP22208765.2

    申请日:2022-11-22

    申请人: INTEL Corporation

    IPC分类号: G06F12/0862

    摘要: An apparatus includes memory circuitry including a first data structure and prefetch circuitry that is coupled to the memory circuitry. The prefetch circuitry is to store, in the first data structure, a first subregion entry corresponding to a first subregion of a memory region allocated to a program. The first subregion entry is to include a plurality of delta values. A first delta value of the plurality of delta values represents a first distance between two cache lines associated with consecutive memory accesses within a second subregion of the memory region. The prefetch circuitry is further to detect a first memory access of a first cache line in the first subregion, identify prefetch candidates based on the first cache line and the plurality of delta values, and issue at least one prefetch request based on at least two of the prefetch candidates to be prefetched into a cache.

    DEVICE, METHOD AND SYSTEM TO PROVIDE A PREDICTED VALUE WITH A SEQUENCE OF MICRO-OPERATIONS

    公开(公告)号:EP4202652A1

    公开(公告)日:2023-06-28

    申请号:EP22205225.0

    申请日:2022-11-03

    申请人: INTEL Corporation

    IPC分类号: G06F9/30 G06F9/38

    摘要: Techniques and mechanisms for efficiently making value prediction information available for use by in a processor. In an embodiment, the instruction execution is to include a loading of some data to a first location (e.g., a first register). A decoder of the processor accesses reference information which indicates that the execution is to comprise multiple micro-operations (µops) including a LoadCheck µop and a Move µop. The LoadCheck µop loads a first value to the first location, and checks whether the loaded first value is the same as a previously-determined second value which represents a prediction of what the first value would be. The Move µop moves the second value to the first location. In another embodiment, the Move µop is scheduled for execution out-of-order with respect to the LoadCheck µοp, resulting in an early availability of the second value for access in a register file by another µop.

    HARDWARE FOR SPLIT DATA TRANSLATION LOOKASIDE BUFFERS

    公开(公告)号:EP3771985A1

    公开(公告)日:2021-02-03

    申请号:EP20174019.8

    申请日:2020-05-12

    申请人: Intel Corporation

    IPC分类号: G06F12/1027

    摘要: Systems, methods, and apparatuses relating to hardware for split data translation lookaside buffers. In one embodiment, a processor includes a decode circuit to decode instructions into decoded instructions, an execution circuit to execute the decoded instructions, and a memory circuit comprising a load data translation lookaside buffer circuit and a store data translation lookaside buffer circuit separate and distinct from the load data translation lookaside buffer circuit, wherein the memory circuit sends a memory access request of the instructions to the load data translation lookaside buffer circuit when the memory access request is a load data request and to the store data translation lookaside buffer circuit when the memory access request is a store data request to determine a physical address for a virtual address of the memory access request.

    Method and apparatus to protect a processor against excessive power
    8.
    发明公开
    Method and apparatus to protect a processor against excessive power 审中-公开
    用于保护处理器从过多的功率的方法和设备

    公开(公告)号:EP2819008A2

    公开(公告)日:2014-12-31

    申请号:EP14171643.1

    申请日:2014-06-09

    申请人: Intel Corporation

    IPC分类号: G06F9/48 G06F1/32 G06F9/30

    摘要: In an embodiment, a processor includes at least a first core. The first core includes execution logic to execute operations, and a first event counter to determine a first event count associated with events of a first type that have occurred since a start of a first defined interval. The first core also includes a second event counter to determine a second event count associated with events of a second type that have occurred since the start of the first defined interval, and stall logic to stall execution of operations including at least first operations associated with events of the first type, until the first defined interval is expired responsive to the first event count exceeding a first combination threshold concurrently with the second event count exceeding a second combination threshold. Other embodiments are described and claimed.

    摘要翻译: ,实施例中的处理器包括至少一个第一芯。 所述第一芯包括执行逻辑,以执行操作,并且因为第一经界定间隔的开始的第一事件计数器确定性矿与第一类型的thathave事件相关联的第一事件计数发生。 因此,firstScore包括第二事件计数器确定性矿由于第一经界定间隔的开始与第二类型的thathave的事件相关联的第二事件计数发生,和失速逻辑失速与事件相关联的操作包括至少第一操作的执行 第一类型的,直到第一经界定间隔是响应于第一事件计数超过第一阈值的组合同时与所述第二事件计数超过第二阈值组合过期。 其他实施例中描述并要求保护。