SPATIAL AND TEMPORAL MERGING OF REMOTE ATOMIC OPERATIONS

    公开(公告)号:EP3506087A1

    公开(公告)日:2019-07-03

    申请号:EP18209326.0

    申请日:2018-11-29

    申请人: INTEL Corporation

    IPC分类号: G06F9/30 G06F12/08

    摘要: Disclosed embodiments relate to spatial and temporal merging of remote atomic operations. In one example, a system includes an RAO instruction queue stored in a memory and having entries grouped by destination cache line, each entry to enqueue an RAO instruction including an opcode, a destination identifier, and source data, optimization circuitry to receive an incoming RAO instruction, scan the RAO instruction queue to detect a matching enqueued RAO instruction identifying a same destination cache line as the incoming RAO instruction, the optimization circuitry further to, responsive to no matching enqueued RAO instruction being detected, enqueue the incoming RAO instruction; and, responsive to a matching enqueued RAO instruction being detected, determine whether the incoming and matching RAO instructions have a same opcode to non-overlapping cache line elements, and, if so, spatially combine the incoming and matching RAO instructions by enqueuing both RAO instructions in a same group of cache line queue entries at different offsets.

    METHOD AND APPARATUS FOR PERFORMING CONFLICT DETECTION
    2.
    发明公开
    METHOD AND APPARATUS FOR PERFORMING CONFLICT DETECTION 审中-公开
    用于执行冲突检测的方法和设备

    公开(公告)号:EP3238043A1

    公开(公告)日:2017-11-01

    申请号:EP15873965.6

    申请日:2015-11-23

    申请人: Intel Corporation

    IPC分类号: G06F9/38 G06F9/30

    摘要: An apparatus and method are described for performing conflict detection operations. For example, one embodiment of a processor comprises: a first source vector register to store a first set of data elements; a second source vector register to store a second set of data elements; conflict detection logic to perform a specified comparison operation comparing each of the first set of data elements with specified data elements from the second set and generating a set of comparison results, the comparison operation to be selected from a group consisting of a greater than comparison, a less than comparison, a greater than or equal to comparison, a less than or equal to comparison, and a not equal to comparison.

    摘要翻译: 描述了用于执行冲突检测操作的设备和方法。 例如,处理器的一个实施例包括:第一源向量寄存器,用于存储第一组数据元素; 第二源向量寄存器,用于存储第二组数据元素; 冲突检测逻辑,用于执行指定的比较操作,将第一组数据元素中的每一个与来自第二组中的指定数据元素进行比较,并且生成一组比较结果,比较操作从包括大于比较, 小于比较,大于或等于比较,小于或等于比较以及不等于比较。

    SYSTEMS AND METHODS FOR PERFORMING HORIZONTAL TILE OPERATIONS

    公开(公告)号:EP3623940A2

    公开(公告)日:2020-03-18

    申请号:EP19183497.7

    申请日:2019-06-28

    申请人: Intel Corporation

    IPC分类号: G06F9/30 G06F9/38

    摘要: Disclosed embodiments relate to systems and methods for performing instructions specifying horizontal tile operations. In one example, a processor includes fetch circuitry to fetch an instruction specifying a horizontal tile operation, a location of a M by N source matrix comprising K groups of elements, and locations of K destinations, wherein each of the K groups of elements comprises the same number of elements, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction by generating K results, each result being generated by performing the specified horizontal tile operation across every element of a corresponding group of the K groups, and writing each generated result to a corresponding location of the K specified destination locations.

    SYSTEMS FOR PERFORMING INSTRUCTIONS TO QUICKLY CONVERT AND USE TILES AS 1D VECTORS

    公开(公告)号:EP4141661A1

    公开(公告)日:2023-03-01

    申请号:EP22200756.9

    申请日:2019-06-26

    申请人: Intel Corporation

    IPC分类号: G06F9/30

    摘要: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, an apparatus comprises a configuration storage to store configuration information for a two-dimensional (2D) matrix storage, the configuration information to include a first value indicative of a number of rows of the 2D matrix storage and a second value indicative of a number of columns of the 2D matrix storage, fetch circuitry to fetch an instruction, the instruction to specify the 2D matrix storage, a row of the 2D matrix storage, and a 512-bit vector register, decode circuitry, coupled with the fetch circuitry, to decode the instruction, and execution circuitry, coupled with the decode circuitry, to perform operations corresponding to the instruction, including to store the row of the 2D matrix storage to the 512-bit vector register.

    SYSTEMS FOR PERFORMING INSTRUCTIONS TO QUICKLY CONVERT AND USE TILES AS 1D VECTORS

    公开(公告)号:EP3629154A3

    公开(公告)日:2020-05-06

    申请号:EP19182737.7

    申请日:2019-06-26

    申请人: INTEL Corporation

    IPC分类号: G06F9/30

    摘要: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.

    SYSTEMS AND METHODS FOR IMPLEMENTING CHAINED TILE OPERATIONS

    公开(公告)号:EP3547120A1

    公开(公告)日:2019-10-02

    申请号:EP19157043.1

    申请日:2019-02-13

    申请人: INTEL Corporation

    IPC分类号: G06F9/38 G06F15/78 G06F9/30

    摘要: Disclosed embodiments relate to systems and methods for implementing chained tile operations. In one example, a processor includes fetch circuitry to fetch one or more instructions until a plurality of instructions has been fetched, each instruction to specify source and destination tile operands, decode circuitry to decode the fetched instructions, and execution circuitry, responsive to the decoded instructions, to: identify first and second decoded instructions belonging to a chain of instructions, dynamically select and configure a SIMD path comprising first and second processing engines (PE) to execute the first and second decoded instructions, and set aside the specified destination of the first decoded instruction, and instead route a result of the first decoded instruction from the first PE to be used by the second PE to perform the second decoded instruction.