Systems, apparatuses, and methods for chained fused multiply add

    公开(公告)号:US10853065B2

    公开(公告)日:2020-12-01

    申请号:US16169456

    申请日:2018-10-24

    Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand. Execution circuitry executes the decoded single instruction to perform iterations of packed fused multiply accumulate operations by multiplying packed data elements of the sources of the first type by sub-elements of the scalar value, and adding results of these multiplications to an initial value in a first iteration and a result from a previous iteration in subsequent iterations.

    APPARATUS AND METHOD FOR PERFORMING A CHECK TO OPTIMIZE INSTRUCTION FLOW
    2.
    发明申请
    APPARATUS AND METHOD FOR PERFORMING A CHECK TO OPTIMIZE INSTRUCTION FLOW 有权
    执行检查以优化指导流量的装置和方法

    公开(公告)号:US20160179515A1

    公开(公告)日:2016-06-23

    申请号:US14581815

    申请日:2014-12-23

    Abstract: An apparatus and method for performing a check on inputs to a mathematical instruction and selecting a default sequence efficiently managing the architectural state of a processor. For example, one embodiment of a processor comprises: an arithmetic logic unit (ALU) to perform a plurality of mathematical operations using one or more source operands; instruction check logic to evaluate the source operands for a current mathematical instruction and to determine, based on the evaluation, whether to execute a default sequence of operations including executing the current mathematical instruction by the ALU or to jump to an alternate sequence of operations adapted to provide a result for the mathematical instruction having particular types of source operands more efficiently than the default sequence of operations.

    Abstract translation: 一种用于对数学指令的输入进行检查并选择有效地管理处理器的架构状态的默认序列的装置和方法。 例如,处理器的一个实施例包括:使用一个或多个源操作数执行多个数学运算的算术逻辑单元(ALU); 指令检查逻辑以评估当前数学指令的源操作数,并且基于评估来确定是否执行默认操作序列,包括由ALU执行当前数学指令或跳转到适于 为具有比默认操作序列更有效的特定类型的源操作数的数学指令提供结果。

    Systems, apparatuses, and methods for chained fused multiply add

    公开(公告)号:US11487541B2

    公开(公告)日:2022-11-01

    申请号:US17107134

    申请日:2020-11-30

    Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand. Execution circuitry executes the decoded single instruction to perform iterations of packed fused multiply accumulate operations by multiplying packed data elements of the sources of the first type by sub-elements of the scalar value, and adding results of these multiplications to an initial value in a first iteration and a result from a previous iteration in subsequent iterations.

    Systems, apparatuses, and methods for chained fused multiply add

    公开(公告)号:US10146535B2

    公开(公告)日:2018-12-04

    申请号:US15299420

    申请日:2016-10-20

    Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand. Execution circuitry executes the decoded single instruction to perform iterations of packed fused multiply accumulate operations by multiplying packed data elements of the sources of the first type by sub-elements of the scalar value, and adding results of these multiplications to an initial value in a first iteration and a result from a previous iteration in subsequent iterations.

Patent Agency Ranking