METHOD AND APPARATUS FOR EFFICIENT MATRIX ALIGNMENT IN A SYSTOLIC ARRAY

    公开(公告)号:US20190042262A1

    公开(公告)日:2019-02-07

    申请号:US16147506

    申请日:2018-09-28

    IPC分类号: G06F9/38 G06F15/80 G06F9/30

    摘要: An apparatus and method for efficient matrix alignment in a systolic array. For example, one embodiment of a processor comprises: a first set of physical tile registers to store first matrix data in rows or columns; a second set of physical tile registers to store second matrix data in rows or columns; a decoder to decode a matrix instruction identifying a first input matrix, a first offset, a second input matrix, and a second offset; and execution circuitry, responsive to the matrix instruction, to read a subset of rows or columns from the first set of physical tile registers in accordance with the first offset, spanning multiple physical tile registers from the first set if indicated by the first offset to generate a first input matrix and the execution circuitry to read a subset of rows or columns from the second set of physical tile registers in accordance with the second offset, spanning multiple physical tile registers from the second set if indicated by the second offset to generate a second input matrix; and the execution circuitry to perform an arithmetic operation with the first and second input matrices in accordance with an opcode of the matrix instruction.

    Floating point round-off amount determination processors, methods, systems, and instructions
    88.
    发明授权
    Floating point round-off amount determination processors, methods, systems, and instructions 有权
    浮点数四舍五入确定处理器,方法,系统和说明

    公开(公告)号:US09513871B2

    公开(公告)日:2016-12-06

    申请号:US13977257

    申请日:2011-12-30

    IPC分类号: G06F7/483 G06F9/30 G06F7/499

    摘要: A method of an aspect includes receiving a floating point round-off amount determination instruction. The instruction indicates a source of one or more floating point data elements, indicates a number of fraction bits after a radix point, and indicates a destination storage location. A result including one or more result floating point data elements is stored in the destination storage location in response to the floating point round-off amount determination instruction. Each of the one or more result floating point data elements includes a difference between a corresponding floating point data element of the source in a corresponding position, and a rounded version of the corresponding floating point data element of the source that has been rounded to the indicated number of the fraction bits. Other methods, apparatus, systems, and instructions are disclosed.

    摘要翻译: 一种方面的方法包括接收浮点舍入量确定指令。 该指令指示一个或多个浮点数据元素的源,指示小数点之后的小数位数,并指示目的地存储位置。 包括一个或多个结果浮点数据元素的结果响应于浮点舍入量确定指令被存储在目的地存储位置中。 一个或多个结果浮点数据元素中的每一个包括相应位置的源的相应浮点数据元素与已被舍入到指示的源的相应浮点数据元素的舍入版本之间的差 小数位数。 公开了其它方法,装置,系统和指令。

    Multi-element instruction with different read and write masks
    89.
    发明授权
    Multi-element instruction with different read and write masks 有权
    具有不同读写掩码的多元素指令

    公开(公告)号:US09489196B2

    公开(公告)日:2016-11-08

    申请号:US13997998

    申请日:2011-12-23

    IPC分类号: G06F7/76 G06F9/30

    摘要: A method is described that includes reading a first read mask from a first register. The method also includes reading a first vector operand from a second register or memory location. The method also includes applying the read mask against the first vector operand to produce a set of elements for operation. The method also includes performing an operation of the set elements. The method also includes creating an output vector by producing multiple instances of the operation's result. The method also includes reading a first write mask from a third register, the first write mask being different than the first read mask. The method also includes applying the write mask against the output vector to create a resultant vector. The method also includes writing the resultant vector to a destination register.

    摘要翻译: 描述了一种包括从第一寄存器读取第一读取掩码的方法。 该方法还包括从第二寄存器或存储器位置读取第一向量操作数。 该方法还包括对第一向量操作数应用读取掩码以产生用于操作的一组元素。 该方法还包括执行设定元件的操作。 该方法还包括通过产生操作结果的多个实例来创建输出向量。 该方法还包括从第三寄存器读取第一写掩码,第一写掩码不同于第一读掩码。 该方法还包括针对输出向量应用写掩码以产生合成矢量。 该方法还包括将结果矢量写入目的地寄存器。