-
公开(公告)号:US20210132943A1
公开(公告)日:2021-05-06
申请号:US16486960
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Robert VALENTINE , Dan BAUM , Zeev SPERBER , Jesus CORBAL , Elmoustapha OULD-AHMED-VALL , Bret L. TOLL , Mark J. CHARNEY , Menachem ADELMAN , Barukh ZIV , Alexander HEINECKE , Simon RUBANOVICH
IPC: G06F9/30
Abstract: Embodiments detailed herein relate to matrix operations. For example, embodiments of instruction support for matrix (tile) dot product operations are detailed. Exemplary instructions including computing a dot product of signed words and accumulating in a double word with saturation; computing a dot product of bytes and accumulating in to a dword with saturation, where the input bytes can be signed or unsigned and the dword accumulation has output saturation; etc.
-
公开(公告)号:US20210089316A1
公开(公告)日:2021-03-25
申请号:US16582433
申请日:2019-09-25
Applicant: Intel Corporation
Inventor: William RASH , Subramaniam MAIYURAN , Varghese GEORGE , Bret L. TOLL , Rajesh SANKARAN , Robert S. CHAPPELL , Supratim PAL , Alexander F. HEINECKE , Elmoustapha OULD-AHMED-VALL , Gang CHEN
Abstract: Disclosed embodiments relate to deep learning implementations using systolic arrays and fused operations. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction having fields to specify an opcode and locations of a destination and N source matrices, the opcode indicating the processor is to load the N source matrices from memory, perform N convolutions on the N source matrices to generate N feature maps, and store results of the N convolutions in registers to be passed to an activation layer, wherein the processor is to perform the N convolutions and the activation layer with at most one memory load of each of the N source matrices. The processor further includes scheduling circuitry to schedule execution of the instruction and execution circuitry to execute the instruction as per the opcode.
-
公开(公告)号:US20200210186A1
公开(公告)日:2020-07-02
申请号:US16233418
申请日:2018-12-27
Applicant: Intel Corporation
Inventor: Elmoustapha OULD-AHMED-VALL
IPC: G06F9/345 , G06F9/30 , G06F9/38 , G06F12/0811
Abstract: Embodiments of systems, apparatuses, and methods for storing data elements in a processor are described. For example, execution circuitry executes a decoded instruction, the instruction having a first field identifying a location in main memory, a second field identifying a register storing a data element to be stored at the location in main memory, and an opcode to indicate to execution circuitry to store the data element at the location in main memory without storing the data element in a data cache of the processor, by storing the data element at the location in main memory without storing the data element in the data cache of the processor.
-
公开(公告)号:US20200210182A1
公开(公告)日:2020-07-02
申请号:US16232931
申请日:2018-12-26
Applicant: Intel Corporation
Inventor: Christopher J. HUGHES , Michael ESPIG , Dan BAUM , Robert VALENTINE , Bret TOLL , Elmoustapha OULD-AHMED-VALL
Abstract: Disclosed embodiments relate to systems and methods for performing duplicate detection instructions on two-dimensional (2D) data. In one example, a processor includes fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction having fields to specify an opcode and locations of a source matrix comprising M×N elements and a destination, the opcode to indicate execution circuitry is to use a plurality of comparators to discover duplicates in the source matrix, and store indications of locations of discovered duplicates in the destination. The execution circuitry to execute the decoded instruction as per the opcode.
-
公开(公告)号:US20190339972A1
公开(公告)日:2019-11-07
申请号:US16474483
申请日:2017-07-01
Applicant: Intel Corporation
Inventor: Robert VALENTINE , Dan BAUM , Zeev SPERBER , Jesus CORBAL , Elmoustapha OULD-AHMED-VALL , Bret L. TOLL , Mark J. CHARNEY , Alexander HEINECKE
IPC: G06F9/30
Abstract: Embodiments detailed herein relate to matrix operations. In particular, tile diagonal support is described. For example, a processor is detailed having decode circuitry to decode an instruction having fields for an opcode, a source operand identifier, and a destination matrix operand identifier; and execution circuitry to execute the decoded instruction to write the identified source operand to each element along a main diagonal of the identified destination matrix operand.
-
公开(公告)号:US20190227800A1
公开(公告)日:2019-07-25
申请号:US16289506
申请日:2019-02-28
Applicant: Intel Corporation
Inventor: Robert C. VALENTINE , Jesus Corbal SAN ADRIAN , Roger Espasa SANS , Robert D. CAVIN , Bret L. TOLL , Santiago Galan DURAN , Jeffrey G. WIEDEMEIER , Sridhar SAMUDRALA , Milind Baburao GIRKAR , Edward Thomas GROCHOWSKI , Jonathan Cannon HALL , Dennis R. BRADFORD , Elmoustapha OULD-AHMED-VALL , James C. ABEL , Mark CHARNEY , Seth ABRAHAM , Suleyman SAIR , Andrew Thomas FORSYTH , Lisa WU , Charles YOUNT
IPC: G06F9/30
CPC classification number: G06F9/30145 , G06F9/3001 , G06F9/30014 , G06F9/30018 , G06F9/30025 , G06F9/30032 , G06F9/30036 , G06F9/30047 , G06F9/30149 , G06F9/30181 , G06F9/30185 , G06F9/30192 , G06F9/34
Abstract: A vector friendly instruction format and execution thereof. According to one embodiment of the invention, a processor is configured to execute an instruction set. The instruction set includes a vector friendly instruction format. The vector friendly instruction format has a plurality of fields including a base operation field, a modifier field, an augmentation operation field, and a data element width field, wherein the first instruction format supports different versions of base operations and different augmentation operations through placement of different values in the base operation field, the modifier field, the alpha field, the beta field, and the data element width field, and wherein only one of the different values may be placed in each of the base operation field, the modifier field, the alpha field, the beta field, and the data element width field on each occurrence of an instruction in the first instruction format in instruction streams.
-
公开(公告)号:US20190138303A1
公开(公告)日:2019-05-09
申请号:US16110298
申请日:2018-08-23
Applicant: Intel Corporation
Inventor: Elmoustapha OULD-AHMED-VALL , David GUILLEN FANDOS , Jesus F. SANCHEZ , Guillem SOLE , Roger ESPASA
Abstract: An apparatus and method are described for performing vector horizontal logical instruction. For example, one embodiment of a processor comprises: fetch logic to fetch an instruction from memory, and execution logic to determine a value of a first set of one or more data elements from a first specified set of bits of an immediate operand, wherein positions of the first set of one or more data elements determined from the first specified set of bits of the immediate operand are based on a first set of one or more index values that have a most significant bit corresponding to a packed data element at a first set of one or more positions of a destination packed data operand and that have a least significant bit corresponding to a data element at a corresponding position of a first source packed data operand.
-
68.
公开(公告)号:US20190108029A1
公开(公告)日:2019-04-11
申请号:US16145156
申请日:2018-09-27
Applicant: Intel Corporation
Inventor: Jesus CORBAL SAN ADRIAN , Bret L. TOLL , Robert C. VALENTINE , Jeffrey G. WIEDEMEIER , Sridhar SAMUDRALA , Milind Baburao GIRKAR , Andrew Thomas FORSYTH , Elmoustapha OULD-AHMED-VALL , Dennis R. BRADFORD , Lisa K. WU
IPC: G06F9/30
Abstract: Embodiments of systems, apparatuses, and methods for performing a blend instruction in a computer processor are described. In some embodiments, the execution of a blend instruction causes a data element-by-element selection of data elements of first and second source operands using the corresponding bit positions of a writemask as a selector between the first and second operands and storage of the selected data elements into the destination at the corresponding position in the destination.
-
69.
公开(公告)号:US20190102196A1
公开(公告)日:2019-04-04
申请号:US16147254
申请日:2018-09-28
Applicant: Intel Corporation
Inventor: Raanan SADE , Robert VALENTINE , Bret TOLL , Christopher J. HUGHES , Alexander F. HEINECKE , Elmoustapha OULD-AHMED-VALL , Mark J. CHARNEY
IPC: G06F9/30
Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transform matrices into a row-interleaved format. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction having fields to specify an opcode and locations of source and destination matrices, wherein the opcode indicates that the processor is to transform the specified source matrix into the specified destination matrix having the row-interleaved format; and execution circuitry to respond to the decoded instruction by transforming the specified source matrix into the specified RowInt-formatted destination matrix by interleaving J elements of each J-element sub-column of the specified source matrix in either row-major or column-major order into a K-wide submatrix of the specified destination matrix, the K-wide submatrix having K columns and enough rows to hold the J elements.
-
公开(公告)号:US20190102192A1
公开(公告)日:2019-04-04
申请号:US15721444
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: Venkateswara MADDURI , Elmoustapha OULD-AHMED-VALL , Robert VALENTINE , Mark CHARNEY
IPC: G06F9/30
Abstract: An apparatus and method for performing right-shifting operations on packed quadword data. For example, one embodiment of a processor comprises: a decoder to decode a right-shift instruction to generate a decoded right-shift instruction; a first source register to store a plurality of packed quadwords data elements; execution circuitry to execute the decoded right-shift instruction, the execution circuitry comprising shift circuitry to right-shift at least first and second packed quadword data elements from first and second packed quadword data element locations, respectively, in the first source register by an amount specified in an immediate value or in a control value in a second source register, to generate first and second right-shifted quadwords; the execution circuitry to cause selection of a specified set of most significant bits of the first and second right-shifted quadwords to be written to least significant bit regions of first and second quadword data element locations, respectively, of a destination register; and the destination register to store the specified set of the most significant bits of the first and second right-shifted quadwords.
-
-
-
-
-
-
-
-
-