OPTIMIZED COMPUTE HARDWARE FOR MACHINE LEARNING OPERATIONS

    公开(公告)号:EP3783479A1

    公开(公告)日:2021-02-24

    申请号:EP20200955.1

    申请日:2018-04-30

    申请人: INTEL Corporation

    IPC分类号: G06F9/30

    摘要: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a fetch unit to fetch a single instruction having multiple input operands, wherein the multiple input operands have an unequal bit-length, a first input operand having a first bit-length and a second input operand having a second bit-length; a decode unit to decode the single instruction into a decoded instruction; an operand length unit to determine a smaller bit-length of the first bit-length and the second bit-length; and a compute unit to perform a matrix operation on the multiple input operands to generate an output value having a bit length of the smaller bit length.

    SYSTEMS AND METHODS FOR PERFORMING HORIZONTAL TILE OPERATIONS

    公开(公告)号:EP3623940A2

    公开(公告)日:2020-03-18

    申请号:EP19183497.7

    申请日:2019-06-28

    申请人: Intel Corporation

    IPC分类号: G06F9/30 G06F9/38

    摘要: Disclosed embodiments relate to systems and methods for performing instructions specifying horizontal tile operations. In one example, a processor includes fetch circuitry to fetch an instruction specifying a horizontal tile operation, a location of a M by N source matrix comprising K groups of elements, and locations of K destinations, wherein each of the K groups of elements comprises the same number of elements, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction by generating K results, each result being generated by performing the specified horizontal tile operation across every element of a corresponding group of the K groups, and writing each generated result to a corresponding location of the K specified destination locations.

    SYSTEMS FOR PERFORMING INSTRUCTIONS TO QUICKLY CONVERT AND USE TILES AS 1D VECTORS

    公开(公告)号:EP4141661A1

    公开(公告)日:2023-03-01

    申请号:EP22200756.9

    申请日:2019-06-26

    申请人: Intel Corporation

    IPC分类号: G06F9/30

    摘要: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, an apparatus comprises a configuration storage to store configuration information for a two-dimensional (2D) matrix storage, the configuration information to include a first value indicative of a number of rows of the 2D matrix storage and a second value indicative of a number of columns of the 2D matrix storage, fetch circuitry to fetch an instruction, the instruction to specify the 2D matrix storage, a row of the 2D matrix storage, and a 512-bit vector register, decode circuitry, coupled with the fetch circuitry, to decode the instruction, and execution circuitry, coupled with the decode circuitry, to perform operations corresponding to the instruction, including to store the row of the 2D matrix storage to the 512-bit vector register.

    SYSTEMS FOR PERFORMING INSTRUCTIONS TO QUICKLY CONVERT AND USE TILES AS 1D VECTORS

    公开(公告)号:EP3629154A3

    公开(公告)日:2020-05-06

    申请号:EP19182737.7

    申请日:2019-06-26

    申请人: INTEL Corporation

    IPC分类号: G06F9/30

    摘要: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.

    SYSTEMS AND METHODS FOR IMPLEMENTING CHAINED TILE OPERATIONS

    公开(公告)号:EP3547120A1

    公开(公告)日:2019-10-02

    申请号:EP19157043.1

    申请日:2019-02-13

    申请人: INTEL Corporation

    IPC分类号: G06F9/38 G06F15/78 G06F9/30

    摘要: Disclosed embodiments relate to systems and methods for implementing chained tile operations. In one example, a processor includes fetch circuitry to fetch one or more instructions until a plurality of instructions has been fetched, each instruction to specify source and destination tile operands, decode circuitry to decode the fetched instructions, and execution circuitry, responsive to the decoded instructions, to: identify first and second decoded instructions belonging to a chain of instructions, dynamically select and configure a SIMD path comprising first and second processing engines (PE) to execute the first and second decoded instructions, and set aside the specified destination of the first decoded instruction, and instead route a result of the first decoded instruction from the first PE to be used by the second PE to perform the second decoded instruction.

    HARDWARE APPARATUSES AND METHODS TO PREFETCH A MULTIDIMENSIONAL BLOCK OF ELEMENTS FROM A MULTIMENSIONAL ARRAY
    8.
    发明公开
    HARDWARE APPARATUSES AND METHODS TO PREFETCH A MULTIDIMENSIONAL BLOCK OF ELEMENTS FROM A MULTIMENSIONAL ARRAY 审中-公开
    硬件设备和方法来预测多维阵列中的多维元素

    公开(公告)号:EP3238072A1

    公开(公告)日:2017-11-01

    申请号:EP15874043.1

    申请日:2015-11-25

    申请人: Intel Corporation

    IPC分类号: G06F12/08 G06F9/30

    摘要: Methods and apparatuses relating to a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache. In one embodiment, a hardware processor includes a decoder to decode a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache, wherein at least one operand of the prefetch instruction is to indicate a system memory address of an element of the multidimensional block of elements, a stride of the multidimensional block of elements, and boundaries of the multidimensional block of elements, and an execution unit to execute the prefetch instruction to generate system memory addresses of the other elements of the multidimensional block of elements, and load the multidimensional block of elements into the cache from the system memory addresses.

    摘要翻译: 涉及预取指令以将多维数组的多维块从多维数组预取到高速缓存中的方法和设备。 在一个实施例中,硬件处理器包括解码器,用于对预取指令进行解码以将多维元素的多维块预取到高速缓存中,其中预取指令的至少一个操作数用于指示元素的系统存储器地址 所述多维元素块,所述多维元素块的步幅和所述多维元素块的边界,以及执行单元,用于执行所述预取指令以生成所述多维元素块的其他元素的系统存储器地址,以及 将多维元素块从系统内存地址加载到高速缓存中。

    SYSTEMS FOR PERFORMING INSTRUCTIONS TO QUICKLY CONVERT AND USE TILES AS 1D VECTORS

    公开(公告)号:EP4177738A1

    公开(公告)日:2023-05-10

    申请号:EP22217001.1

    申请日:2019-06-26

    申请人: INTEL Corporation

    IPC分类号: G06F9/30

    摘要: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor comprises fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional, 2D, matrix and a one-dimensional, 1D, vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, or a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector; decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.