SYSTEMS FOR PERFORMING INSTRUCTIONS TO QUICKLY CONVERT AND USE TILES AS 1D VECTORS

    公开(公告)号:US20220100515A1

    公开(公告)日:2022-03-31

    申请号:US17549221

    申请日:2021-12-13

    Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.

    APPARATUS AND METHOD FOR MULTIPLY, ADD/SUBTRACT, AND ACCUMULATE OF PACKED DATA ELEMENTS

    公开(公告)号:US20210357215A1

    公开(公告)日:2021-11-18

    申请号:US17380930

    申请日:2021-07-20

    Abstract: An apparatus and method for performing dual concurrent multiplications, subtraction/addition, and accumulation of packed data elements. For example one embodiment of a processor comprises: a decoder to decode an instruction to generate a decoded instruction; a first source register to store first and second packed data elements; a second source register to store third and fourth packed data elements; execution circuitry to execute the decoded instruction, the execution circuitry comprising: multiplier circuitry to multiply the first and third packed data elements to generate a first temporary product and to concurrently multiply the second and fourth packed data elements to generate a second temporary product, the first through fourth packed data elements all being a first width; circuitry to negate the first temporary product to generate a negated first product; adder circuitry to add the first negated product to a first accumulated packed data element from a third source register to generate a first result, the first result being a second width which is at least twice as large as the first width; the adder circuitry to concurrently add the second temporary product to a second accumulated packed data element to generate a second result of the second width; the first and second results to be stored in specified first and second data element positions within a destination register.

    SYSTEMS, APPARATUSES, AND METHODS FOR DUAL COMPLEX MULTIPLY ADD OF SIGNED WORDS

    公开(公告)号:US20210157580A1

    公开(公告)日:2021-05-27

    申请号:US16614118

    申请日:2017-06-30

    Abstract: Embodiments of systems, apparatuses, and methods for dual complex number multiplication and addition in a processor are described. For example, execution circuitry executes a decoded instruction to multiplex data values from positions in source operands to a multiplier, the source operands including pairs complex numbers, calculate a real part of a product of each pair of complex numbers, add the real part of the product of a first pair of complex numbers to the real part of the product of a second pair of complex numbers to calculate a first real result, and add the real part of the product of a third pair of complex numbers to the real part of the product of a fourth pair of complex numbers to calculate a second real result, and store the results to corresponding positions in the destination operand.

    SYSTEMS, APPARATUSES, AND METHODS FOR GENERATING AN INDEX BY SORT ORDER AND REORDERING ELEMENTS BASED ON SORT ORDER

    公开(公告)号:US20200050452A1

    公开(公告)日:2020-02-13

    申请号:US16367186

    申请日:2019-03-27

    Abstract: Disclosed embodiments relate to apparatuses, systems, and methods for performing sort indexing and/or permutation using an index. An exemplary apparatus includes decode circuitry to decode an instruction, the instruction to include a first field to identify a location of a source vector, a second field to identify a location of a destination vector, and an opcode to indicate to execution circuitry to execute the decoded instruction to sort values of the source vector and store a result of the sort in the destination vector by generating, per each element of the source vector, an index value using one or more comparisons of the element itself and to other data elements of the source vector, and permuting the values of the elements of the source vector based upon the index values for the elements and execution circuitry to execute the decoded instruction as indicated by the opcode.

Patent Agency Ranking