8-BIT FLOATING POINT FUSED MULTIPLY INSTRUCTIONS

    公开(公告)号:US20240045688A1

    公开(公告)日:2024-02-08

    申请号:US17958369

    申请日:2022-10-01

    CPC classification number: G06F9/3016 G06F7/4876 G06F9/3001

    Abstract: Techniques for performing FP8 FMA in response to an instruction are described. In some examples, an instruction has fields for an opcode, an identification of location of a packed data source/destination operand (a first source), an identification of a location of a second packed data source operand, an identification of a location of a third packed data source operand, and an identification of location of a packed data source/destination operand, wherein the opcode is to indicate operand ordering and that execution circuitry is to, per data element position, perform a FP8 value fused multiply-accumulate operation using the first, second, and third source operands and store a result in a corresponding data element position of the source/destination operand, wherein the FP8 value has an 8-bit floating point format that comprises one bit for a sign, at least 4 bits for an exponent, and at least two bits for a fraction.

    Scaling half-precision floating point tensors for training deep neural networks

    公开(公告)号:US11507815B2

    公开(公告)日:2022-11-22

    申请号:US17742138

    申请日:2022-05-11

    Abstract: A graphics processor is described that includes a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The multiprocessor can execute parallel threads of instructions associated with a command stream, where the multiprocessor includes a set of functional units to execute at least one of the parallel threads of the instructions. The set of functional units can include a mixed precision tensor processor to perform tensor computations. The functional units can also include circuitry to analyze statistics for output values of the tensor computations, determine a target format to convert the output values, the target format determined based on the statistics for the output values and a precision associated with a second layer of the neural network, and convert the output values to the target format.

    INSTRUCTIONS TO CONVERT FROM FP16 TO BF8

    公开(公告)号:US20220206805A1

    公开(公告)日:2022-06-30

    申请号:US17134353

    申请日:2020-12-26

    Abstract: Techniques for converting FP16 data elements to BF8 data elements using a single instruction are described. An exemplary apparatus includes decoder circuitry to decode a single instruction, the single instruction to include a one or more fields to identify a source operand, one or more fields to identify a destination operand, and one or more fields for an opcode, the opcode to indicate that execution circuitry is to convert packed half-precision floating-point data from the identified source to packed bfloat8 data and store the packed bfloat8 data into corresponding data element positions of the identified destination operand; and execution circuitry to execute the decoded instruction according to the opcode to convert packed half-precision floating-point data from the identified source to packed bfloat8 data and store the packed bfloat8 data into corresponding data element positions.

    HARDWARE APPARATUSES AND METHODS RELATING TO ELEMENTAL REGISTER ACCESSES
    30.
    发明申请
    HARDWARE APPARATUSES AND METHODS RELATING TO ELEMENTAL REGISTER ACCESSES 有权
    硬件设备和与元件寄存器访问相关的方法

    公开(公告)号:US20160188334A1

    公开(公告)日:2016-06-30

    申请号:US14582784

    申请日:2014-12-24

    CPC classification number: G06F9/30036

    Abstract: Methods and apparatuses relating to a vector instruction with a register operand with an elemental offset are described. In one embodiment, a hardware processor includes a decode unit to decode a vector instruction with a register operand with an elemental offset to access a first number of elements in a register specified by the register operand, wherein the first number is a total number of elements in the register minus the elemental offset, access a second number of elements in a next logical register, wherein the second number is the elemental offset, and combine the first number of elements and the second number of elements as a data vector, and an execution unit to execute the vector instruction on the data vector.

    Abstract translation: 描述与具有具有基本偏移的寄存器操作数的向量指令相关的方法和装置。 在一个实施例中,硬件处理器包括解码单元,用于对具有基本偏移量的寄存器操作数解码向量指令,以访问由寄存器操作数指定的寄存器中的第一数量的元素,其中第一个数字是元素的总数 在所述寄存器中减去所述元素偏移量,访问下一逻辑寄存器中的第二数量的元素,其中所述第二数量是所述元素偏移量,并且将所述第一数量的元素和所述第二数量的元素组合为数据向量,以及执行 单元来执行数据向量的向量指令。

Patent Agency Ranking