METHOD AND APPARATUS FOR PERFORMING A VECTOR BIT REVERSAL
    41.
    发明申请
    METHOD AND APPARATUS FOR PERFORMING A VECTOR BIT REVERSAL 有权
    用于执行向量位反转的方法和装置

    公开(公告)号:US20160179522A1

    公开(公告)日:2016-06-23

    申请号:US14581883

    申请日:2014-12-23

    CPC classification number: G06F9/30018 G06F9/30032 G06F9/30036

    Abstract: An apparatus and method for performing a vector bit reversal. For example, one embodiment of a processor comprises: a source vector register to store a plurality of source bit groups, wherein a size for the bit groups is to be specified in an immediate of an instruction; vector bit reversal logic to determine a bit group size from the immediate and to responsively reverse positions of contiguous bit groups within the source vector register to generate a set of reversed bit groups; and a destination vector register to store the reversed bit groups.

    Abstract translation: 用于执行向量比特反转的装置和方法。 例如,处理器的一个实施例包括:源向量寄存器,用于存储多个源位组,其中用于位组的大小将在指令的立即指定中; 矢量位反转逻辑,以从源向量寄存器内的邻近位组的立即和响应地反转位置确定位组大小,以产生一组反转位组; 以及存储反向位组的目的地向量寄存器。

    SYSTEMS, APPARATUSES, AND METHODS FOR CHAINED FUSED MULTIPLY ADD

    公开(公告)号:US20250060963A1

    公开(公告)日:2025-02-20

    申请号:US18815382

    申请日:2024-08-26

    Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand. Execution circuitry executes the decoded single instruction to perform iterations of packed fused multiply accumulate operations by multiplying packed data elements of the sources of the first type by sub-elements of the scalar value, and adding results of these multiplications to an initial value in a first iteration and a result from a previous iteration in subsequent iterations.

    SYSTEMS, APPARATUSES, AND METHODS FOR CHAINED FUSED MULTIPLY ADD

    公开(公告)号:US20230083705A1

    公开(公告)日:2023-03-16

    申请号:US17952001

    申请日:2022-09-23

    Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand. Execution circuitry executes the decoded single instruction to perform iterations of packed fused multiply accumulate operations by multiplying packed data elements of the sources of the first type by sub-elements of the scalar value, and adding results of these multiplications to an initial value in a first iteration and a result from a previous iteration in subsequent iterations.

    APPARATUSES, METHODS, AND SYSTEMS FOR A PACKED DATA CONVOLUTION INSTRUCTION WITH SHIFT CONTROL AND WIDTH CONTROL

    公开(公告)号:US20220413853A1

    公开(公告)日:2022-12-29

    申请号:US17359354

    申请日:2021-06-25

    Abstract: Systems, methods, and apparatuses to support packed data convolution instructions with shift control and width control are described. In one embodiment, a hardware processor includes a decoder circuit to decode a single instruction into a decoded single instruction, the single instruction having fields that identify a first packed data source, a second packed data source, a packed data destination, a sliding window width, and a stride, and an opcode that indicates an execution circuit is to generate a first chunk of contiguous elements of the first packed data source having a width of the sliding window width, generate a second chunk of contiguous elements of the first packed data source having the width of the sliding window width and shifted by the stride, multiply each element of the first chunk by a corresponding element of a respective chunk of the second packed data source to generate a first set of products, add the first set of products together to generate a first sum, multiply each element of the second chunk by a corresponding element of a respective chunk of the second packed data source to generate a second set of products, add the second set of products together to generate a second sum, and store the first sum in a first element of the packed data destination and the second sum in a second element of the packed data destination; and the execution circuit is to execute the decoded single instruction according to the opcode.

    APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS FOR LOADING DATA AND PADDING INTO A TILE OF A MATRIX OPERATIONS ACCELERATOR

    公开(公告)号:US20220100513A1

    公开(公告)日:2022-03-31

    申请号:US17134085

    申请日:2020-12-24

    Abstract: Systems, methods, and apparatuses relating to one or more instructions that load data into a tile register and pad a row (or column) with a pad value from a padding circuit are described. In one embodiment, a system includes a matrix operations accelerator circuit comprising a two-dimensional grid of processing elements, a tile register that represents a two-dimensional matrix coupled to the matrix operations accelerator circuit, and a coupling to a memory, a padding circuit coupled to the tile register, and a hardware processor core including a decoder, of the hardware processor core coupled to the matrix operations accelerator circuit, to decode a single instruction into a decoded single instruction, the single instruction comprising a first field that identifies the tile register, a second field that identifies data elements in the memory, and an opcode, the opcode to indicate an execution circuit of the hardware processor core is to cause a load of the data elements from the memory into the tile register and the padding circuit to pad a proper subset of elements of the tile register with a same value, and the execution circuit of the hardware processor core to execute the decoded single instruction according to the opcode.

    APPARATUS AND METHOD FOR PERFORMING DUAL SIGNED AND UNSIGNED MULTIPLICATION OF PACKED DATA ELEMENTS

    公开(公告)号:US20210294604A1

    公开(公告)日:2021-09-23

    申请号:US17226986

    申请日:2021-04-09

    Abstract: An apparatus and method for performing dual concurrent multiplications of packed data elements. For example one embodiment of a processor comprises: a decoder to decode a first instruction to generate a decoded instruction; a first source register to store a first plurality of packed doubleword data elements; a second source register to store a second plurality of packed doubleword data elements; and execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to multiply a first doubleword data element from the first source register with a second doubleword data element from the second source register to generate a first quadword product and to concurrently multiply a third doubleword data element from the first source register with a fourth doubleword data element from the second source register to generate a second quadword product; and a destination register to store the first quadword product and the second quadword product as first and second packed quadword data elements.

    SYSTEMS, APPARATUSES, AND METHODS FOR CHAINED FUSED MULTIPLY ADD

    公开(公告)号:US20210081198A1

    公开(公告)日:2021-03-18

    申请号:US17107134

    申请日:2020-11-30

    Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand. Execution circuitry executes the decoded single instruction to perform iterations of packed fused multiply accumulate operations by multiplying packed data elements of the sources of the first type by sub-elements of the scalar value, and adding results of these multiplications to an initial value in a first iteration and a result from a previous iteration in subsequent iterations.

Patent Agency Ranking