- 专利标题: Systems, apparatuses, and methods for chained fused multiply add
-
申请号: US15299420申请日: 2016-10-20
-
公开(公告)号: US10146535B2公开(公告)日: 2018-12-04
- 发明人: Jesus Corbal , Robert Valentine , Roman S. Dubtsov , Nikita A. Shustrov , Mark J. Charney , Dennis R. Bradford , Milind B. Girkar , Edward T. Grochowski , Thomas D. Fletcher , Warren E. Ferguson
- 申请人: Intel Corporation
- 申请人地址: US CA Santa Clara
- 专利权人: Intel Corporatoin
- 当前专利权人: Intel Corporatoin
- 当前专利权人地址: US CA Santa Clara
- 代理机构: Nicholson De Vos Webster & Elliott LLP
- 主分类号: G06F9/30
- IPC分类号: G06F9/30 ; G06F7/544
摘要:
Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand. Execution circuitry executes the decoded single instruction to perform iterations of packed fused multiply accumulate operations by multiplying packed data elements of the sources of the first type by sub-elements of the scalar value, and adding results of these multiplications to an initial value in a first iteration and a result from a previous iteration in subsequent iterations.