Patent search ap:("INTEL CORPORATION") AND inv:"ROBERT VALENTINE" Page 2

11.

发明申请
METHOD AND APPARATUS FOR PERFORMING A VECTOR BIT REVERSAL AND CROSSING 审中-公开

公开(公告)号：US20180032334A1

公开(公告)日：2018-02-01

申请号：US15729566

申请日：2017-10-10

Applicant: Intel Corporation

Inventor： JESUS CORBAL , ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , MARK J. CHARNEY

IPC: G06F9/30

Abstract: An apparatus and method for performing a vector bit reversal and crossing. For example, one embodiment of a processor comprises: a first source vector register to store a first plurality of source bit groups, wherein a size for the bit groups is to be specified in an immediate of an instruction; a second source vector to store a second plurality of source bit groups; vector bit reversal and crossing logic to determine a bit group size from the immediate and to responsively reverse positions of contiguous bit groups within the first source vector register to generate a set of reversed bit groups, wherein the vector bit reversal and crossing logic is to additionally interleave the set of reversed bit groups with the second plurality of bit groups; and a destination vector register to store the reversed bit groups interleaved with the first plurality of bit groups.

12.

发明申请
SCATTER USING INDEX ARRAY AND FINITE STATE MACHINE 审中-公开

公开(公告)号：US20170351641A1

公开(公告)日：2017-12-07

申请号：US15490743

申请日：2017-04-18

Applicant: Intel Corporation

Inventor： ZEEV SPERBER , ROBERT VALENTINE , SHLOMO RAIKIN , STANISLAV SHWARTSMAN , GAL OFIR , IGOR YANOVER , GUY PATKIN , OFER LEVY

IPC: G06F15/78 , G06F9/345 , G06F9/30 , G06F9/38

CPC classification number: G06F15/7839 , G06F9/30018 , G06F9/30036 , G06F9/30043 , G06F9/30145 , G06F9/345 , G06F9/3808 , G06F9/383

Abstract: Methods and apparatus are disclosed using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode scatter/gather instructions and generate micro-operations. An index array holds a set of indices and a corresponding set of mask elements. A finite state machine facilitates the scatter operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. Storage is allocated in a buffer for each of the set of addresses being generated. Data elements corresponding to the set of addresses being generated are copied to the buffer. Addresses from the set are accessed to store data elements if a corresponding mask element has said first value and the mask element is changed to a second value responsive to completion of their respective stores.

13.

发明申请
APPARATUS AND METHOD OF IMPROVED INSERT INSTRUCTIONS 审中-公开

公开(公告)号：US20170329605A1

公开(公告)日：2017-11-16

申请号：US15668508

申请日：2017-08-03

Applicant: Intel Corporation

Inventor： ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , JESUS CORBAL SAN ADRIAN , BRET L. TOLL , MARK J. CHARNEY , ZEEV SPERBER , AMIT GRADSTEIN

IPC: G06F9/30 , G06F9/38

CPC classification number: G06F9/30181 , G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/3013 , G06F9/30167 , G06F9/3802 , G06F12/0615

Abstract: An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group. The apparatus also includes masking layer circuitry to mask the first and third instructions at a first resultant vector granularity, and, mask the second and fourth instructions at a second resultant vector granularity.

14.

发明申请
FUNCTIONAL UNIT FOR INSTRUCTION EXECUTION PIPELINE CAPABLE OF SHIFTING DIFFERENT CHUNKS OF A PACKED DATA OPERAND BY DIFFERENT AMOUNTS 审中-公开

公开(公告)号：US20170132008A1

公开(公告)日：2017-05-11

申请号：US15413285

申请日：2017-01-23

Applicant: INTEL CORPORATION

Inventor： TAL ULIEL , ROBERT VALENTINE

IPC: G06F9/38

CPC classification number: G06F9/3802 , G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/3824 , G06F9/3867

Abstract: A method is described that includes fetching an instruction. The method further includes decoding the instruction. The instruction specifies an operation, a first operand and a second operand. The method further includes fetching the first and second operands of the instruction. The first and second operands are each composed of a plurality of larger chunks having constituent elements. The method further includes performing the operation specified by the instruction including generating a resultant composed of a plurality of larger chunks having constituent elements. The generating of the resultant includes selecting for each element in the resultant a contiguous group of bits from a same positioned chunk of the first operand as the chunk of the element in the resultant, the contiguous group of bits being identified by a same positioned element of the second operand as the element in the resultant.

15.

发明申请
METHOD AND APPARATUS FOR PERFORMING A VECTOR BIT SHUFFLE 审中-公开
Title translation: 用于执行矢量位块的方法和装置

公开(公告)号：US20160188532A1

公开(公告)日：2016-06-30

申请号：US14583636

申请日：2014-12-27

Applicant: INTEL CORPORATION

Inventor： ELMOUSTAPHA OULD-AHMED-VALL , JESUS CORBAL SAN ADRIAN , ROBERT VALENTINE , MARK J. CHARNEY , GUILLEM SOLE , ROGER ESPASA

IPC: G06F15/80 , G06F9/30

CPC classification number: G06F15/8084 , G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/3012 , G06F15/8076

Abstract: An apparatus and method for performing a vector bit shuffle. For example, one embodiment of a processor comprises: a first vector register to store a plurality of source data elements; a second vector register to store a plurality of control elements, each of the control elements comprising a plurality of bit fields, each bit field to be associated with a corresponding bit position in a destination mask register and to identify a bit from each of the source data elements to be copied to each of the particular bit positions; and vector bit shuffle logic to read each bit field from the second vector register to identify a bit from each of the source data elements and to responsively copy the bit from each of the source data elements to each of the corresponding bit positions in the destination mask register.

Abstract translation: 用于执行向量比特洗牌的装置和方法。例如，处理器的一个实施例包括：第一向量寄存器，用于存储多个源数据元素; 用于存储多个控制元件的第二矢量寄存器，每个控制元件包括多个位域，每个位域与目的地掩模寄存器中的对应位位置相关联，并且从源中的每一个识别位要复制到每个特定位位置的数据元素; 和向量位洗牌逻辑，以从第二向量寄存器读取每个位字段，以识别来自每个源数据元素的位，并且响应地将每个源数据元素中的位复制到目标掩码中的每个相应位位置寄存器。

16.

发明申请
METHOD AND APPARATUS FOR PERFORMING A VECTOR BIT GATHER 审中-公开
Title translation: 用于执行矢量位加法器的方法和装置

公开(公告)号：US20160188335A1

公开(公告)日：2016-06-30

申请号：US14583639

申请日：2014-12-27

Applicant: INTEL CORPORATION

Inventor： ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , JESUS CORBAL SAN ADRIAN , MARK J. CHARNEY , GUILLEM SOLE , ROGER ESPASA

IPC: G06F9/30

CPC classification number: G06F9/30036 , G06F9/30018 , G06F9/30032 , G06F9/30098

Abstract: An apparatus and method for performing a vector bit gather. For example, one embodiment of a processor comprises: a first vector register to store one or more source data elements; a second vector register to store one or more control elements, each of the control elements comprising a plurality of bit fields, each bit field to be associated with a corresponding bit position in a destination vector register and to identify a bit from the one or more source data elements to be copied to each of the particular bit positions; and vector bit gather logic to read each bit field from the second vector register to identify a bit from the one or more source data elements and to responsively copy the bit from each of the one or more source data elements to each of the corresponding bit positions in the destination vector register.

Abstract translation: 用于执行向量位聚合的装置和方法。例如，处理器的一个实施例包括：第一向量寄存器，用于存储一个或多个源数据元素; 第二矢量寄存器，用于存储一个或多个控制元件，每个控制元件包括多个位域，每个位字段将与目的地向量寄存器中的相应位位置相关联，并且从一个或多个位要复制到每个特定位位置的源数据元素; 和向量位采集逻辑，以从第二向量寄存器读取每个位域，以识别来自一个或多个源数据元素的位，并且响应地将该一个或多个源数据元素中的每个源的位复制到相应的位位置在目的向量寄存器中。

17.

发明申请
METHOD AND APPARATUS FOR PERFORMING CONFLICT DETECTION 有权
Title translation: 用于执行冲突检测的方法和装置

公开(公告)号：US20160179528A1

公开(公告)日：2016-06-23

申请号：US14581607

申请日：2014-12-23

Applicant: INTEL CORPORATION

Inventor： CHRISTOPHER J. HUGHES , ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , MILIND B. GIRKAR

IPC: G06F9/30

CPC classification number: G06F9/30036 , G06F9/30018 , G06F9/30021 , G06F9/30047 , G06F9/30112 , G06F9/3834 , G06F9/3838

Abstract: An apparatus and method are described for performing conflict detection operations. For example, one embodiment of a processor comprises: a first source vector register to store a first set of data elements; a second source vector register to store a second set of data elements; conflict detection logic to perform a specified comparison operation comparing each of the first set of data elements with specified data elements from the second set and generating a set of comparison results, the comparison operation to be selected from a group consisting of a greater than comparison, a less than comparison, a greater than or equal to comparison, a less than or equal to comparison, and a not equal to comparison.

Abstract translation: 描述了用于执行冲突检测操作的装置和方法。例如，处理器的一个实施例包括：第一源向量寄存器，用于存储第一组数据元素; 第二源向量寄存器，用于存储第二组数据元素; 冲突检测逻辑，用于执行指定的比较操作，将第一组数据元素与来自第二组的指定数据元素进行比较，并生成一组比较结果，从大于比较的组中选择的比较操作，小于比较，大于或等于比较，小于或等于比较，不等于比较。

18.

发明申请
APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS TO CONVERT 16-BIT FLOATING-POINT FORMATS 有权

公开(公告)号：US20220100507A1

公开(公告)日：2022-03-31

申请号：US17134046

申请日：2020-12-24

Applicant: Intel Corporation

Inventor： ALEXANDER F. HEINECKE , ROBERT VALENTINE , MARK J. CHARNEY , MENACHEM ADELMAN , CHRISTOPHER J. HUGHES , EVANGELOS GEORGANAS , ZEEV SPERBER , AMIT GRADSTEIN , SIMON RUBANOVICH

IPC: G06F9/30 , G06F9/38

Abstract: Systems, methods, and apparatuses relating to instructions to convert 16-bit floating-point formats are described. In one embodiment, a processor includes fetch circuitry to fetch a single instruction having fields to specify an opcode and locations of a source vector comprising N plurality of 16-bit half-precision floating-point elements, and a destination vector to store N plurality of 16-bit bfloat floating-point elements, the opcode to indicate execution circuitry is to convert each of the elements of the source vector from 16-bit half-precision floating-point format to 16-bit bfloat floating-point format and store each converted element into a corresponding location of the destination vector, decode circuitry to decode the fetched single instruction into a decoded single instruction, and the execution circuitry to respond to the decoded single instruction as specified by the opcode.

19.

发明申请
APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS FOR 16-BIT FLOATING-POINT MATRIX DOT PRODUCT INSTRUCTIONS 有权

公开(公告)号：US20220100502A1

公开(公告)日：2022-03-31

申请号：US17134008

申请日：2020-12-24

Applicant: Intel Corporation

Inventor： ALEXANDER F. HEINECKE , ROBERT VALENTINE , MARK J. CHARNEY , MENACHEM ADELMAN , CHRISTOPHER J. HUGHES , EVANGELOS GEORGANAS , ZEEV SPERBER , AMIT GRADSTEIN , SIMON RUBANOVICH

IPC: G06F9/30 , G06F9/38 , G06F17/16 , G06F7/544

Abstract: Systems, methods, and apparatuses relating to 16-bit floating-point matrix dot product instructions are described. In one embodiment, a processor includes fetch circuitry to fetch a single instruction having fields to specify an opcode and locations of a M by N destination matrix having single-precision elements, an M by K first source matrix, and a K by N second source matrix, the source matrices having elements that each comprise a pair of half-precision floating-point values, the opcode to indicate execution circuitry is to cause, for each element of the first source matrix and corresponding element of the second source matrix, a conversion of the half-precision floating-point values to single-precision values, a multiplication of converted single-precision values from first values of the pairs together to generate a first result, a multiplication of converted single-precision values from second values of the pairs together to generate a second result, and an accumulation of the first result and the second result with previous contents of a corresponding element of the destination matrix, decode circuitry to decode the fetched instruction, and the execution circuitry to respond to the decoded instruction as specified by the opcode.

20.

发明申请
SYSTEMS, APPARATUSES, AND METHODS FOR CHAINED FUSED MULTIPLY ADD 审中-公开

公开(公告)号：US20190121637A1

公开(公告)日：2019-04-25

申请号：US16169456

申请日：2018-10-24

Applicant: Intel Corporation

Inventor： JESUS CORBAL , ROBERT VALENTINE , ROMAN S. DUBTSOV , NIKITA A. SHUSTROV , MARK J. CHARNEY , DENNIS R. BRADFORD , MILIND B. GIRKAR , EDWARD T. GROCHOWSKI , THOMAS D. FLETCHER , WARREN E. FERGUSON

IPC: G06F9/30 , G06F7/483 , G06F7/544

Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand. Execution circuitry executes the decoded single instruction to perform iterations of packed fused multiply accumulate operations by multiplying packed data elements of the sources of the first type by sub-elements of the scalar value, and adding results of these multiplications to an initial value in a first iteration and a result from a previous iteration in subsequent iterations.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification