-
公开(公告)号:US12073214B2
公开(公告)日:2024-08-27
申请号:US17952001
申请日:2022-09-23
申请人: Intel Corporation
发明人: Jesus Corbal , Robert Valentine , Roman S. Dubtsov , Nikita A. Shustrov , Mark J. Charney , Dennis R. Bradford , Milind B. Girkar , Edward T. Grochowski , Thomas D. Fletcher , Warren E. Ferguson
CPC分类号: G06F9/3001 , G06F7/483 , G06F7/5443 , G06F9/30036 , G06F9/30109 , G06F9/30112 , G06F9/3893
摘要: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand. Execution circuitry executes the decoded single instruction to perform iterations of packed fused multiply accumulate operations by multiplying packed data elements of the sources of the first type by sub-elements of the scalar value, and adding results of these multiplications to an initial value in a first iteration and a result from a previous iteration in subsequent iterations.
-
公开(公告)号:US11954489B2
公开(公告)日:2024-04-09
申请号:US17549363
申请日:2021-12-13
申请人: Intel Corporation
发明人: Bret Toll , Christopher J. Hughes , Dan Baum , Elmoustapha Ould-Ahmed-Vall , Raanan Sade , Robert Valentine , Mark J. Charney , Alexander F. Heinecke
IPC分类号: G06F9/30
CPC分类号: G06F9/30145 , G06F9/30032 , G06F9/30036 , G06F9/30109
摘要: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.
-
公开(公告)号:US20230385059A1
公开(公告)日:2023-11-30
申请号:US18449651
申请日:2023-08-14
申请人: Intel Corporation
发明人: Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall
CPC分类号: G06F9/3001 , G06F9/30145 , G06F9/3005 , G06F9/30036 , G06F9/383 , G06F9/3016 , G06F9/30109 , G06F9/30123 , G06F9/30076 , G06F9/3824 , G06F9/30043
摘要: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (M,N) of the identified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the identified first source matrix by a corresponding nibble of a doubleword element (K,N) of the identified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element (M,N).
-
公开(公告)号:US11714648B2
公开(公告)日:2023-08-01
申请号:US17549221
申请日:2021-12-13
申请人: Intel Corporation
发明人: Bret Toll , Christopher J. Hughes , Dan Baum , Elmoustapha Ould-Ahmed-Vall , Raanan Sade , Robert Valentine , Mark J. Charney , Alexander F. Heinecke
IPC分类号: G06F9/30
CPC分类号: G06F9/30145 , G06F9/30032 , G06F9/30036 , G06F9/30109
摘要: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.
-
公开(公告)号:US11669344B2
公开(公告)日:2023-06-06
申请号:US16117529
申请日:2018-08-30
发明人: Gregory Edvenson , Jeremy Chritz , David Hulton
CPC分类号: G06F9/4496 , G06F8/31 , G06F9/30007 , G06F9/30109 , G06F9/44505 , G06F15/80
摘要: Apparatuses and methods are disclosed for an FPGA architecture that may improve processing speed and efficiency in processing less complex operands. Some applications may utilize operands that are less complex, such as operands that are 1, 2, or 4 bits, for example. In some examples, the DSP architecture may skip or avoid processing all received operands or may process a common operand more frequently than other operands. An example apparatus may include a first configurable logic unit configured to receive a first operand and a second operand; a second configurable logic unit configured to receive a third operand and the first calculated operand; a first switch configured to receive the first operand and a fourth operand and to output a first selected operand; and a second switch configured to receive the second calculated operand and the first selected operand.
-
公开(公告)号:US20180307494A1
公开(公告)日:2018-10-25
申请号:US15494773
申请日:2017-04-24
申请人: Intel Corporation
发明人: ELMOUSTAPHA OULD-AHMED-VALL , BARATH LAKSHMANAN , TATIANA SHPEISMAN , Joydeep Ray , Ping T. Tang , Michael Strickland , Xiaoming Chen , Anbang Yao , Ben J. Ashbaugh , Linda L. Hurd , Liwei Ma
CPC分类号: G06F9/3887 , G06F1/32 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/30094 , G06F9/30109 , G06F9/30112 , G06F9/3016 , G06F9/3851 , G06F9/3891 , G06F9/50 , G06F13/4068 , G06F13/4282 , G06F15/80 , G06F2213/0026 , G06N3/00 , G06N3/0445 , G06N3/0454 , G06N3/063 , G06N3/084 , G06N20/00 , G06T1/20
摘要: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising instruction decode logic to decode a single instruction including multiple operands into a single decoded instruction, the multiple operands having differing precisions and a general-purpose graphics compute unit including a first logic unit and a second logic unit, the general-purpose graphics compute unit to execute the single decoded instruction, wherein to execute the single decoded instruction includes to perform a first instruction operation on a first set of operands of the multiple operands at a first precision and a simultaneously perform second instruction operation on a second set of operands of the multiple operands at a second precision.
-
公开(公告)号:US10019262B2
公开(公告)日:2018-07-10
申请号:US14977782
申请日:2015-12-22
申请人: Intel Corporation
发明人: Ashish Jha , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Mark J. Charney , Milind B. Girkar
CPC分类号: G06F9/30036 , G06F9/30043 , G06F9/30109 , G06F9/3455
摘要: A processor comprises a plurality of vector registers, and an execution unit, operatively coupled to the plurality of vector registers, the execution unit comprising a logic circuit implementing a load instruction for loading, into two or more vector registers, two or more data items associated with a data structure stored in a memory, wherein each one of the two or more vector registers is to store a data item associated with a certain position number within the data structure.
-
公开(公告)号:US20180074822A1
公开(公告)日:2018-03-15
申请号:US15808788
申请日:2017-11-09
申请人: Intel Corporation
发明人: Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Bret L. Toll , Mark J. Charney , Zeev Sperber , Amit Gradstein
IPC分类号: G06F9/30
CPC分类号: G06F9/30029 , G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/30109
摘要: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.
-
公开(公告)号:US09910824B2
公开(公告)日:2018-03-06
申请号:US14727076
申请日:2015-06-01
发明人: Mayan Moudgill , Gary J. Nacer , C. John Glossner , Arthur Joseph Hoane , Paul Hurtley , Murugappan Senthilvelan , Pablo Balzola
CPC分类号: G06F15/8053 , G06F9/3001 , G06F9/30021 , G06F9/30036 , G06F9/30101 , G06F9/30109 , G06F9/30112 , G06F9/30141 , G06F9/3836 , G06F9/3855 , G06F15/7828 , G06F15/7839 , G06F15/8076 , G06F17/142
摘要: A computer processor is disclosed. The computer processor may comprise a vector unit comprising a vector register file comprising at least one register to hold a varying number of elements. The computer processor may further comprise processing logic configured to operate on the varying number of elements in the vector register file using one or more instructions that separate a vector or combine two vectors. The computer processor may be implemented as a monolithic integrated circuit.
-
公开(公告)号:US20180033468A1
公开(公告)日:2018-02-01
申请号:US15728293
申请日:2017-10-09
申请人: lntel Corporation
发明人: Glenn Hinton , Bret Toll , Ronak Singhal
CPC分类号: G11C7/1036 , G06F9/30043 , G06F9/30109 , G06F9/30163
摘要: A processor includes N-bit registers and a decode unit to receive a multiple register memory access instruction. The multiple register memory access instruction is to indicate a memory location and a register. The processor includes a memory access unit coupled with the decode unit and with the N-bit registers. The memory access unit is to perform a multiple register memory access operation in response to the multiple register memory access instruction. The operation is to involve N-bit data, in each of the N-bit registers comprising the indicated register. The operation is also to involve different corresponding N-bit portions of an M×N-bit line of memory corresponding to the indicated memory location. A total number of bits of the N-bit data in the N-bit registers to be involved in the multiple register memory access operation is to amount to at least half of the M×N-bits of the line of memory.
-
-
-
-
-
-
-
-
-