Apparatus and method for a range comparison, exchange, and add

    公开(公告)号:US11036501B2

    公开(公告)日:2021-06-15

    申请号:US16231580

    申请日:2018-12-23

    Abstract: An apparatus and method for executing an atomic test and update instruction. For example, one embodiment of a processor comprises: a decoder to decode an atomic test and update (ATU) instruction having a first operand specifying a first value in a first storage location, a second operand specifying a second value in a second storage location, a third operand specifying a third value in a third storage location, and an opcode specifying a condition to be tested relative to the first and second values; and execution circuitry to perform a load lock operation to load the first value from the first storage location, the load lock operation to prevent access by another instruction before a result of the ATU instruction is stored, the execution circuitry to test a condition related the first value and the second value, wherein if the condition is met then the execution circuitry is to add the first value and the third value to generate a sum and to store the sum to the first storage location.

    Method and apparatus for efficient matrix alignment in a systolic array

    公开(公告)号:US10929143B2

    公开(公告)日:2021-02-23

    申请号:US16147506

    申请日:2018-09-28

    Abstract: An apparatus and method for efficient matrix alignment in a systolic array. For example, one embodiment of a processor comprises: a first set of physical tile registers to store first matrix data in rows or columns; a second set of physical tile registers to store second matrix data in rows or columns; a decoder to decode a matrix instruction identifying a first input matrix, a first offset, a second input matrix, and a second offset; and execution circuitry, responsive to the matrix instruction, to read a subset of rows or columns from the first set of physical tile registers in accordance with the first offset, spanning multiple physical tile registers from the first set if indicated by the first offset to generate a first input matrix and the execution circuitry to read a subset of rows or columns from the second set of physical tile registers in accordance with the second offset, spanning multiple physical tile registers from the second set if indicated by the second offset to generate a second input matrix; and the execution circuitry to perform an arithmetic operation with the first and second input matrices in accordance with an opcode of the matrix instruction.

    Apparatus and method for prioritized quality of service processing for transactional memory

    公开(公告)号:US10719442B2

    公开(公告)日:2020-07-21

    申请号:US16126907

    申请日:2018-09-10

    Abstract: An apparatus and method for prioritizing transactional memory regions. For example, one embodiment of a processor comprises: a plurality of cores to execute threads comprising sequences of instructions, at least some of the instructions specifying a transactional memory region; a cache of each core to store a plurality of cache lines; transactional memory circuitry of each core to manage execution of the transactional memory (TM) regions based on priorities associated with each of the TM regions; and wherein the transactional memory circuitry, upon detecting a conflict between a first TM region having a first priority value and a second TM region having a second priority value, is to determine which of the first TM region or the second TM region is permitted to continue executing and which is to be aborted based, at least in part, on the first and second priority values.

    APPARATUS AND METHOD FOR COMPLEX MULTIPLY AND ACCUMULATE

    公开(公告)号:US20190163472A1

    公开(公告)日:2019-05-30

    申请号:US15824324

    申请日:2017-11-28

    Abstract: An embodiment of the invention is a processor including execution circuitry to calculate, in response to a decoded instruction, a result of a complex multiply-accumulate of a first complex number, a second complex number, and a third complex number. The calculation includes a first operation to calculate a first term of a real component of the result and a first term of the imaginary component of the result. The calculation also includes a second operation to calculate a second term of the real component of the result and a second term of the imaginary component of the result. The processor also includes a decoder to decode an instruction to generate the decoded instruction and a first source register, a second source register, and a source and destination register to provide the first complex number, the second complex number, and the third complex number, respectively.

    INSTRUCTION AND LOGIC FOR A CACHE PREFETCHER AND DATALESS FILL BUFFER
    80.
    发明申请
    INSTRUCTION AND LOGIC FOR A CACHE PREFETCHER AND DATALESS FILL BUFFER 有权
    高速缓存和数据填充缓冲区的指令和逻辑

    公开(公告)号:US20160070651A1

    公开(公告)日:2016-03-10

    申请号:US14481266

    申请日:2014-09-09

    Abstract: A processor includes a cache hierarchy and an execution unit. The cache hierarchy includes a lower level cache and a higher level cache. The execution unit includes logic to issue a memory operation to access the cache hierarchy. The lower level cache includes logic to determine that a requested cache line of the memory operation is unavailable in the lower level cache, determine that a line fill buffer of the lower level cache is full, and initiate prefetching of the requested cache line from the higher level cache based upon the determination that the line fill buffer of the lower level cache is full. The line fill buffer is to forward miss requests to the higher level cache.

    Abstract translation: 处理器包括缓存层级和执行单元。 高速缓存层级包括较低级别的缓存和较高级别的高速缓存。 执行单元包括发出存储器操作以访问高速缓存层级的逻辑。 下级高速缓存包括确定存储器操作的所请求的高速缓存行在下级高速缓存中不可用的逻辑,确定较低级高速缓存的行填充缓冲区已满,并且从较高级缓存启动所请求的高速缓存行的预取 基于下级缓存的行填充缓冲器的确定已满的高级缓存。 行填充缓冲区是将错误请求转发到更高级别的缓存。

Patent Agency Ranking