-
71.
公开(公告)号:US10303471B2
公开(公告)日:2019-05-28
申请号:US15445741
申请日:2017-02-28
Applicant: Intel Corporation
Inventor: Elmoustapha Ould-Ahmed-Vall , Mostafa Hagog , Robert Valentine , Amit Gradstein , Simon Rubanovich , Zeev Sperber
Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector double block packed sum of absolute differences (SAD) in response to a single vector double block packed sum of absolute differences instruction that includes a destination vector register operand, first and second source operands, an immediate, and an opcode are described.
-
公开(公告)号:US10223111B2
公开(公告)日:2019-03-05
申请号:US15721796
申请日:2017-09-30
Applicant: Intel Corporation
Inventor: Seth Abraham , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Zeev Sperber , Amit Gradstein
Abstract: A method of an aspect includes receiving an instruction. The instruction indicates an integer stride, indicates an integer offset, and indicates a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four integers in numerical order with a smallest one of the at least four integers differing from zero by the integer offset and with all integers of the sequence in consecutive positions differing by the integer stride. Other methods, apparatus, systems, and instructions are disclosed.
-
公开(公告)号:US20190042541A1
公开(公告)日:2019-02-07
申请号:US15859271
申请日:2017-12-29
Applicant: Intel Corporation
Inventor: Raanan Sade , Simon Rubanovich , Amit Gradstein , Zeev Sperber , Alexander Heinecke , Robert Valentine , Mark J. Charney , Bret Toll , Jesus Corbal , Elmoustapha Ould-Ahmed-Vall , Menachem Adelman
Abstract: Embodiments detailed herein relate to matrix operations. For example, embodiments of instruction support for matrix (tile) dot product operations are detailed. Exemplary instructions including computing a dot product of signed words and accumulating in a quadword data elements of a matrix pair. Additionally, in some instances, non-accumulating quadword data elements of the matrix pair are set to zero.
-
公开(公告)号:US09996319B2
公开(公告)日:2018-06-12
申请号:US14998366
申请日:2015-12-23
Applicant: Intel Corporation
Inventor: Cristina S. Anderson , Marius A. Cornea-Hasegan , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Nikita Astafev , Mark J. Charney , Milind B. Girkar , Amit Gradstein , Simon Rubanovich , Zeev Sperber
CPC classification number: G06F7/485
Abstract: An example processor includes a register and an ADD low functional unit. The register stores first, second, and third floating point (FP) values. The ADD low functional unit receives a request to perform an ADD low operation and, responsive to the request: adds the first FP value with the second FP value to obtain a first sum value; rounds the first sum value to generate an ADD value; adds the first FP value with the second FP value to obtain a second sum value; subtracts the ADD value from the second sum value to generate a difference value; normalizes the difference value to obtain a normalized difference value; rounds the normalized difference value to generate an ADD low value; and sends the ADD low value to an application.
-
公开(公告)号:US20180121198A1
公开(公告)日:2018-05-03
申请号:US15801652
申请日:2017-11-02
Applicant: Intel Corporation
Inventor: Zeev Sperber , Robert Valentine , Benny Eitan , Doron Orenstein
Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.
-
公开(公告)号:US20180088940A1
公开(公告)日:2018-03-29
申请号:US15280324
申请日:2016-09-29
Applicant: Intel Corporation
Inventor: Simon Rubanovich , Thierry Pons , Zeev Sperber , Amit Gradstein
IPC: G06F9/30
CPC classification number: G06F9/30014 , G06F7/00 , G06F7/483 , G06F7/5443
Abstract: A processor for floating point underflow detection includes circuitry to decode a first instruction and a floating point unit. The decoded instruction, when executed by the processor, may be for performing a fused multiply-add (FMA) operation. The floating point unit includes circuitry to determine a non-normalized result of the first instruction based on a first input, a second input, and a third input. The floating point unit further includes circuitry to determine whether underflow exists in the non-normalized result based on a first exponent of the first input, a second exponent of the second input, and a third exponent of the third input.
-
公开(公告)号:US09753889B2
公开(公告)日:2017-09-05
申请号:US14881111
申请日:2015-10-12
Applicant: Intel Corporation
Inventor: Zeev Sperber , Robert Valentine , Guy Patkin , Stanislav Shwartsman , Shlomo Raikin , Igor Yanover , Gal Ofir
CPC classification number: G06F15/8007 , G06F9/30018 , G06F9/30036 , G06F9/30043 , G06F9/30145 , G06F9/345 , G06F9/3887
Abstract: Methods and apparatus are disclosed for using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode a scatter/gather instruction and generate a set of micro-operations, and an index array to hold a set of indices and a corresponding set of mask elements. A finite state machine facilitates the gather operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. An address is accessed to load a corresponding data element if the mask element had the first value. The data element is written at an in-register position in a destination vector register according to a respective in-register position the index. Values of corresponding mask elements are changed from the first value to a second value responsive to completion of their respective loads.
-
公开(公告)号:US20170192934A1
公开(公告)日:2017-07-06
申请号:US14616323
申请日:2015-02-06
Applicant: Intel Corporation
Inventor: Zeev Sperber , Robert Valentine , Guy Patkin , Stanislav Shwartsman , Shlomo Raikin , Igor Yanover , Gal Ofir
CPC classification number: G06F15/8007 , G06F9/30018 , G06F9/30036 , G06F9/30043 , G06F9/30145 , G06F9/345 , G06F9/3887
Abstract: Methods and apparatus are disclosed for using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode a scatter/gather instruction and generate a set of micro-operations, and an index array to hold a set of indices and a corresponding set of mask elements. A finite state machine facilitates the gather operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. An address is accessed to load a corresponding data element if the mask element had the first value. The data element is written at an in-register position in a destination vector register according to a respective in-register position the index. Values of corresponding mask elements are changed from the first value to a second value responsive to completion of their respective loads.
-
公开(公告)号:US20170185377A1
公开(公告)日:2017-06-29
申请号:US14998366
申请日:2015-12-23
Applicant: Intel Corporation
Inventor: Cristina S. Anderson , Marius A. Cornea-Hasegan , Elmoustapha Ould-Ahmed-Vall , Robert Valentine , Jesus Corbal , Nikita Astafev , Mark J. Charney , Milind B. Girkar , Amit Gradstein , Simon Rubanovich , Zeev Sperber
IPC: G06F7/485
CPC classification number: G06F7/485
Abstract: An example processor includes a register and an ADD low functional unit. The register stores first, second, and third floating point (FP) values. The ADD low functional unit receives a request to perform an ADD low operation and, responsive to the request: adds the first FP value with the second FP value to obtain a first sum value; rounds the first sum value to generate an ADD value; adds the first FP value with the second FP value to obtain a second sum value; subtracts the ADD value from the second sum value to generate a difference value; normalizes the difference value to obtain a normalized difference value; rounds the normalized difference value to generate an ADD low value; and sends the ADD low value to an application.
-
公开(公告)号:US09672034B2
公开(公告)日:2017-06-06
申请号:US13838048
申请日:2013-03-15
Applicant: Intel Corporation
Inventor: Zeev Sperber , Robert Valentine , Benny Eitan , Doron Orenstein
CPC classification number: G06F9/30032 , G06F9/30036 , G06F9/3885 , G06F9/3887
Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.
-
-
-
-
-
-
-
-
-