专利检索 cpc:"G06F9/30109" 第 1 页

1.

发明授权
Systems, apparatuses, and methods for chained fused multiply add 有权

公开(公告)号：US12073214B2

公开(公告)日：2024-08-27

申请号：US17952001

申请日：2022-09-23

申请人： Intel Corporation

发明人： Jesus Corbal , Robert Valentine , Roman S. Dubtsov , Nikita A. Shustrov , Mark J. Charney , Dennis R. Bradford , Milind B. Girkar , Edward T. Grochowski , Thomas D. Fletcher , Warren E. Ferguson

IPC分类号： G06F9/30 , G06F7/483 , G06F7/544 , G06F9/38

CPC分类号： G06F9/3001 , G06F7/483 , G06F7/5443 , G06F9/30036 , G06F9/30109 , G06F9/30112 , G06F9/3893

摘要： Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand. Execution circuitry executes the decoded single instruction to perform iterations of packed fused multiply accumulate operations by multiplying packed data elements of the sources of the first type by sub-elements of the scalar value, and adding results of these multiplications to an initial value in a first iteration and a result from a previous iteration in subsequent iterations.

2.

发明授权
Systems for performing instructions to quickly convert and use tiles as 1D vectors 有权

公开(公告)号：US11954489B2

公开(公告)日：2024-04-09

申请号：US17549363

申请日：2021-12-13

申请人： Intel Corporation

发明人： Bret Toll , Christopher J. Hughes , Dan Baum , Elmoustapha Ould-Ahmed-Vall , Raanan Sade , Robert Valentine , Mark J. Charney , Alexander F. Heinecke

IPC分类号： G06F9/30

CPC分类号： G06F9/30145 , G06F9/30032 , G06F9/30036 , G06F9/30109

摘要： Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.

3.

发明公开
SYSTEMS AND METHODS FOR COMPUTING DOT PRODUCTS OF NIBBLES IN TWO TILE OPERANDS 审中-公开

公开(公告)号：US20230385059A1

公开(公告)日：2023-11-30

申请号：US18449651

申请日：2023-08-14

申请人： Intel Corporation

发明人： Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall

IPC分类号： G06F9/30 , G06F9/38

CPC分类号： G06F9/3001 , G06F9/30145 , G06F9/3005 , G06F9/30036 , G06F9/383 , G06F9/3016 , G06F9/30109 , G06F9/30123 , G06F9/30076 , G06F9/3824 , G06F9/30043

摘要： Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (M,N) of the identified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the identified first source matrix by a corresponding nibble of a doubleword element (K,N) of the identified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element (M,N).

4.

发明授权
Systems for performing instructions to quickly convert and use tiles as 1D vectors 有权

公开(公告)号：US11714648B2

公开(公告)日：2023-08-01

申请号：US17549221

申请日：2021-12-13

申请人： Intel Corporation

发明人： Bret Toll , Christopher J. Hughes , Dan Baum , Elmoustapha Ould-Ahmed-Vall , Raanan Sade , Robert Valentine , Mark J. Charney , Alexander F. Heinecke

IPC分类号： G06F9/30

CPC分类号： G06F9/30145 , G06F9/30032 , G06F9/30036 , G06F9/30109

摘要： Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.

5.

发明授权
DSP execution slice array to provide operands to multiple logic units 有权

公开(公告)号：US11669344B2

公开(公告)日：2023-06-06

申请号：US16117529

申请日：2018-08-30

申请人： MICRON TECHNOLOGY, INC.

发明人： Gregory Edvenson , Jeremy Chritz , David Hulton

IPC分类号： G06F15/80 , G06F9/448 , G06F9/30 , G06F8/30 , G06F9/445

CPC分类号： G06F9/4496 , G06F8/31 , G06F9/30007 , G06F9/30109 , G06F9/44505 , G06F15/80

摘要： Apparatuses and methods are disclosed for an FPGA architecture that may improve processing speed and efficiency in processing less complex operands. Some applications may utilize operands that are less complex, such as operands that are 1, 2, or 4 bits, for example. In some examples, the DSP architecture may skip or avoid processing all received operands or may process a common operand more frequently than other operands. An example apparatus may include a first configurable logic unit configured to receive a first operand and a second operand; a second configurable logic unit configured to receive a third operand and the first calculated operand; a first switch configured to receive the first operand and a fourth operand and to output a first selected operand; and a second switch configured to receive the second calculated operand and the first selected operand.

6.

发明申请
MIXED INFERENCE USING LOW AND HIGH PRECISION 审中-公开

公开(公告)号：US20180307494A1

公开(公告)日：2018-10-25

申请号：US15494773

申请日：2017-04-24

申请人： Intel Corporation

发明人： ELMOUSTAPHA OULD-AHMED-VALL , BARATH LAKSHMANAN , TATIANA SHPEISMAN , Joydeep Ray , Ping T. Tang , Michael Strickland , Xiaoming Chen , Anbang Yao , Ben J. Ashbaugh , Linda L. Hurd , Liwei Ma

IPC分类号： G06F9/38 , G06F9/30 , G06F13/42 , G06F13/40 , G06N99/00

CPC分类号： G06F9/3887 , G06F1/32 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/30094 , G06F9/30109 , G06F9/30112 , G06F9/3016 , G06F9/3851 , G06F9/3891 , G06F9/50 , G06F13/4068 , G06F13/4282 , G06F15/80 , G06F2213/0026 , G06N3/00 , G06N3/0445 , G06N3/0454 , G06N3/063 , G06N3/084 , G06N20/00 , G06T1/20

摘要： One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising instruction decode logic to decode a single instruction including multiple operands into a single decoded instruction, the multiple operands having differing precisions and a general-purpose graphics compute unit including a first logic unit and a second logic unit, the general-purpose graphics compute unit to execute the single decoded instruction, wherein to execute the single decoded instruction includes to perform a first instruction operation on a first set of operands of the multiple operands at a first precision and a simultaneously perform second instruction operation on a second set of operands of the multiple operands at a second precision.

7.

发明授权
Vector store/load instructions for array of structures 有权

公开(公告)号：US10019262B2

公开(公告)日：2018-07-10

申请号：US14977782

申请日：2015-12-22

申请人： Intel Corporation

发明人： Ashish Jha , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Mark J. Charney , Milind B. Girkar

IPC分类号： G06F9/30 , G06F9/345

CPC分类号： G06F9/30036 , G06F9/30043 , G06F9/30109 , G06F9/3455

摘要： A processor comprises a plurality of vector registers, and an execution unit, operatively coupled to the plurality of vector registers, the execution unit comprising a logic circuit implementing a load instruction for loading, into two or more vector registers, two or more data items associated with a data structure stored in a memory, wherein each one of the two or more vector registers is to store a data item associated with a certain position number within the data structure.

8.

发明申请
APPARATUS AND METHOD OF IMPROVED PERMUTE INSTRUCTIONS 审中-公开

公开(公告)号：US20180074822A1

公开(公告)日：2018-03-15

申请号：US15808788

申请日：2017-11-09

申请人： Intel Corporation

发明人： Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Bret L. Toll , Mark J. Charney , Zeev Sperber , Amit Gradstein

IPC分类号： G06F9/30

CPC分类号： G06F9/30029 , G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/30109

摘要： An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.

9.

发明授权
Vector processor configured to operate on variable length vectors using instructions to combine and split vectors 有权

公开(公告)号：US09910824B2

公开(公告)日：2018-03-06

申请号：US14727076

申请日：2015-06-01

申请人： Optimum Semiconductor Technologies, Inc.

发明人： Mayan Moudgill , Gary J. Nacer , C. John Glossner , Arthur Joseph Hoane , Paul Hurtley , Murugappan Senthilvelan , Pablo Balzola

IPC分类号： G06F9/30 , G06F9/38 , G06F15/80 , G06F15/78 , G06F17/14

CPC分类号： G06F15/8053 , G06F9/3001 , G06F9/30021 , G06F9/30036 , G06F9/30101 , G06F9/30109 , G06F9/30112 , G06F9/30141 , G06F9/3836 , G06F9/3855 , G06F15/7828 , G06F15/7839 , G06F15/8076 , G06F17/142

摘要： A computer processor is disclosed. The computer processor may comprise a vector unit comprising a vector register file comprising at least one register to hold a varying number of elements. The computer processor may further comprise processing logic configured to operate on the varying number of elements in the vector register file using one or more instructions that separate a vector or combine two vectors. The computer processor may be implemented as a monolithic integrated circuit.

10.

发明申请
MULTIPLE REGISTER MEMORY ACCESS INSTRUCTIONS, PROCESSORS, METHODS, AND SYSTEMS 审中-公开

公开(公告)号：US20180033468A1

公开(公告)日：2018-02-01

申请号：US15728293

申请日：2017-10-09

申请人： lntel Corporation

发明人： Glenn Hinton , Bret Toll , Ronak Singhal

IPC分类号： G11C7/10 , G06F9/30

CPC分类号： G11C7/1036 , G06F9/30043 , G06F9/30109 , G06F9/30163

摘要： A processor includes N-bit registers and a decode unit to receive a multiple register memory access instruction. The multiple register memory access instruction is to indicate a memory location and a register. The processor includes a memory access unit coupled with the decode unit and with the N-bit registers. The memory access unit is to perform a multiple register memory access operation in response to the multiple register memory access instruction. The operation is to involve N-bit data, in each of the N-bit registers comprising the indicated register. The operation is also to involve different corresponding N-bit portions of an M×N-bit line of memory corresponding to the indicated memory location. A total number of bits of the N-bit data in the N-bit registers to be involved in the multiple register memory access operation is to amount to at least half of the M×N-bits of the line of memory.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类