Patent search ap:("INTEL CORPORATION") AND inv:"MARK J. CHARNEY" Page 2

11.

发明申请
APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS TO CONVERT 16-BIT FLOATING-POINT FORMATS 有权

公开(公告)号：US20220100507A1

公开(公告)日：2022-03-31

申请号：US17134046

申请日：2020-12-24

Applicant: Intel Corporation

Inventor： ALEXANDER F. HEINECKE , ROBERT VALENTINE , MARK J. CHARNEY , MENACHEM ADELMAN , CHRISTOPHER J. HUGHES , EVANGELOS GEORGANAS , ZEEV SPERBER , AMIT GRADSTEIN , SIMON RUBANOVICH

IPC: G06F9/30 , G06F9/38

Abstract: Systems, methods, and apparatuses relating to instructions to convert 16-bit floating-point formats are described. In one embodiment, a processor includes fetch circuitry to fetch a single instruction having fields to specify an opcode and locations of a source vector comprising N plurality of 16-bit half-precision floating-point elements, and a destination vector to store N plurality of 16-bit bfloat floating-point elements, the opcode to indicate execution circuitry is to convert each of the elements of the source vector from 16-bit half-precision floating-point format to 16-bit bfloat floating-point format and store each converted element into a corresponding location of the destination vector, decode circuitry to decode the fetched single instruction into a decoded single instruction, and the execution circuitry to respond to the decoded single instruction as specified by the opcode.

12.

发明申请
APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS FOR 16-BIT FLOATING-POINT MATRIX DOT PRODUCT INSTRUCTIONS 有权

公开(公告)号：US20220100502A1

公开(公告)日：2022-03-31

申请号：US17134008

申请日：2020-12-24

Applicant: Intel Corporation

Inventor： ALEXANDER F. HEINECKE , ROBERT VALENTINE , MARK J. CHARNEY , MENACHEM ADELMAN , CHRISTOPHER J. HUGHES , EVANGELOS GEORGANAS , ZEEV SPERBER , AMIT GRADSTEIN , SIMON RUBANOVICH

IPC: G06F9/30 , G06F9/38 , G06F17/16 , G06F7/544

Abstract: Systems, methods, and apparatuses relating to 16-bit floating-point matrix dot product instructions are described. In one embodiment, a processor includes fetch circuitry to fetch a single instruction having fields to specify an opcode and locations of a M by N destination matrix having single-precision elements, an M by K first source matrix, and a K by N second source matrix, the source matrices having elements that each comprise a pair of half-precision floating-point values, the opcode to indicate execution circuitry is to cause, for each element of the first source matrix and corresponding element of the second source matrix, a conversion of the half-precision floating-point values to single-precision values, a multiplication of converted single-precision values from first values of the pairs together to generate a first result, a multiplication of converted single-precision values from second values of the pairs together to generate a second result, and an accumulation of the first result and the second result with previous contents of a corresponding element of the destination matrix, decode circuitry to decode the fetched instruction, and the execution circuitry to respond to the decoded instruction as specified by the opcode.

13.

发明申请
SYSTEMS, APPARATUSES, AND METHODS FOR CHAINED FUSED MULTIPLY ADD 审中-公开

公开(公告)号：US20190121637A1

公开(公告)日：2019-04-25

申请号：US16169456

申请日：2018-10-24

Applicant: Intel Corporation

Inventor： JESUS CORBAL , ROBERT VALENTINE , ROMAN S. DUBTSOV , NIKITA A. SHUSTROV , MARK J. CHARNEY , DENNIS R. BRADFORD , MILIND B. GIRKAR , EDWARD T. GROCHOWSKI , THOMAS D. FLETCHER , WARREN E. FERGUSON

IPC: G06F9/30 , G06F7/483 , G06F7/544

Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand. Execution circuitry executes the decoded single instruction to perform iterations of packed fused multiply accumulate operations by multiplying packed data elements of the sources of the first type by sub-elements of the scalar value, and adding results of these multiplications to an initial value in a first iteration and a result from a previous iteration in subsequent iterations.

14.

发明申请
APPARATUS AND METHOD OF IMPROVED EXTRACT INSTRUCTIONS 审中-公开

公开(公告)号：US20180081689A1

公开(公告)日：2018-03-22

申请号：US15809818

申请日：2017-11-10

Applicant: Intel Corporation

Inventor： ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , JESUS CORBAL , BRET L. TOLL , MARK J. CHARNEY , ZEEV SPERBER , AMIT GRADSTEIN

IPC: G06F9/30

Abstract: An apparatus is described that includes instruction execution circuitry to execute first, second, third, and fourth instructions, the first and second instructions select a first group of input vector elements from one of multiple first non-overlapping sections of respective first and second input vectors. Each of the multiple first non-overlapping sections have a same bit width as the first group. Both the third and fourth instructions select a second group of input vector elements from one of multiple second non-overlapping sections of respective third and fourth input vectors. The second group has a second bit width that is larger than the first bit width. Each of multiple second non-overlapping sections have a same bit width as the second group. The apparatus includes masking layer circuitry to mask the first and second groups at a first granularity and second granularity.

15.

发明申请
FUSIBLE INSTRUCTIONS AND LOGIC TO PROVIDE OR-TEST AND AND-TEST FUNCTIONALITY USING MULTIPLE TEST SOURCES 审中-公开
Title translation: 使用多个测试源提供可靠的说明和逻辑提供测试和测试功能

公开(公告)号：US20170052788A1

公开(公告)日：2017-02-23

申请号：US15340916

申请日：2016-11-01

Applicant: Intel Corporation

Inventor： MAXIM LOKTYUKHIN , ROBERT VALENTINE , JULIAN C. HORN , MARK J. CHARNEY

IPC: G06F9/38 , G06F9/30

CPC classification number: G06F9/3822 , G06F9/30029 , G06F9/30058 , G06F9/30094 , G06F9/3836

Abstract: Fusible instructions and logic provide OR-test and AND-test functionality on multiple test sources. Some embodiments include a processor decode stage to decode a test instruction for execution, the instruction specifying first, second and third source data operands, and an operation type. Execution units, responsive to the decoded test instruction, perform one logical operation, according to the specified operation type, between data from the first and second source data operands, and perform a second logical operation between the data from the third source data operand and the result of the first logical operation to set a condition flag. Some embodiments generate the test instruction dynamically by fusing one logical instruction with a prior-art test instruction. Other embodiments generate the test instruction through a just-in-time compiler. Some embodiments also fuse the test instruction with a subsequent conditional branch instruction, and perform a branch according to how the condition flag is set.

Abstract translation: 易熔指令和逻辑在多个测试源上提供OR测试和与测试功能。一些实施例包括解码用于执行的测试指令的处理器解码级，指定第一，第二和第三源数据操作数的指令以及操作类型。执行单元响应于解码的测试指令，根据指定的操作类型在来自第一和第二源数据操作数的数据之间执行一个逻辑操作，并且执行来自第三源数据操作数的数据和第一个逻辑运算结果设置条件标志。一些实施例通过将一个逻辑指令与现有技术的测试指令进行融合来动态地产生测试指令。其他实施例通过即时编译器生成测试指令。一些实施例还将测试指令与随后的条件分支指令融合，并且根据条件标志的设置来执行分支。

16.

发明申请
METHOD AND APPARATUS FOR PERFORMING A VECTOR BIT REVERSAL AND CROSSING 有权
Title translation: 用于执行向量位反转和交叉的方法和装置

公开(公告)号：US20160179529A1

公开(公告)日：2016-06-23

申请号：US14581738

申请日：2014-12-23

Applicant: INTEL CORPORATION

Inventor： JESUS CORBAL , ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , MARK J. CHARNEY

IPC: G06F9/30

CPC classification number: G06F9/30036 , G06F9/30018 , G06F9/30032

Abstract: An apparatus and method for performing a vector bit reversal and crossing. For example, one embodiment of a processor comprises: a first source vector register to store a first plurality of source bit groups, wherein a size for the bit groups is to be specified in an immediate of an instruction; a second source vector to store a second plurality of source bit groups; vector bit reversal and crossing logic to determine a bit group size from the immediate and to responsively reverse positions of contiguous bit groups within the first source vector register to generate a set of reversed bit groups, wherein the vector bit reversal and crossing logic is to additionally interleave the set of reversed bit groups with the second plurality of bit groups; and a destination vector register to store the reversed bit groups interleaved with the first plurality of bit groups.

Abstract translation: 用于执行向量位反转和交叉的装置和方法。例如，处理器的一个实施例包括：第一源向量寄存器，用于存储第一多个源位组，其中用于位组的大小将在指令的立即指定中; 用于存储第二多个源比特组的第二源向量; 矢量位反转和交叉逻辑，以从第一源向量寄存器内的连续位组的立即和响应地反向位置确定位组大小，以产生一组反向位组，其中向量位反转和交叉逻辑额外地将所述一组反转位组与所述第二多个位组进行交织; 以及目的地向量寄存器，用于存储与第一多个比特组交织的反向比特组。

17.

发明申请
SYSTEMS, APPARATUSES, AND METHODS FOR CHAINED FUSED MULTIPLY ADD 有权

公开(公告)号：US20250060963A1

公开(公告)日：2025-02-20

申请号：US18815382

申请日：2024-08-26

Applicant: Intel Corporation

Inventor： JESUS CORBAL , ROBERT VALENTINE , ROMAN S. DUBTSOV , NIKITA A. SHUSTROV , MARK J. CHARNEY , DENNIS R. BRADFORD , MILIND B. GIRKAR , EDWARD T. GROCHOWSKI , THOMAS D. FLETCHER , WARREN E. FERGUSON

IPC: G06F9/30 , G06F7/483 , G06F7/544 , G06F9/38

Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand. Execution circuitry executes the decoded single instruction to perform iterations of packed fused multiply accumulate operations by multiplying packed data elements of the sources of the first type by sub-elements of the scalar value, and adding results of these multiplications to an initial value in a first iteration and a result from a previous iteration in subsequent iterations.

18.

发明公开
INSTRUCTION EXECUTION THAT BROADCASTS AND MASKS DATA VALUES AT DIFFERENT LEVELS OF GRANULARITY 审中-公开

公开(公告)号：US20230409732A1

公开(公告)日：2023-12-21

申请号：US18357066

申请日：2023-07-21

Applicant: Intel Corporation

Inventor： ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , JESUS CORBAL , BRET L. TOLL , MARK J. CHARNEY

IPC: G06F21/62 , G06F16/27 , G06F21/70 , G06F9/30 , G06F9/38

CPC classification number: G06F21/6227 , G06F16/27 , G06F21/6254 , G06F21/70 , G06F9/30036 , G06F9/30018 , G06F9/30032 , G06F9/30101 , G06F9/3802

Abstract: An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second data instruction to create a second replication data structure. The execution unit also includes masking logic circuitry to mask the first replication data structure at a first granularity and mask the second replication data structure at a second granularity. The second granularity is twice as fine as the first granularity.

19.

发明申请
SYSTEMS, APPARATUSES, AND METHODS FOR CHAINED FUSED MULTIPLY ADD 有权

公开(公告)号：US20230083705A1

公开(公告)日：2023-03-16

申请号：US17952001

申请日：2022-09-23

Applicant: Intel Corporation

Inventor： JESUS CORBAL , ROBERT VALENTINE , ROMAN S. DUBTSOV , NIKITA A. SHUSTROV , MARK J. CHARNEY , DENNIS R. BRADFORD , MILIND B. GIRKAR , EDWARD T. GROCHOWSKI , THOMAS D. FLETCHER , WARREN E. FERGUSON

IPC: G06F9/30 , G06F7/544 , G06F7/483

Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand. Execution circuitry executes the decoded single instruction to perform iterations of packed fused multiply accumulate operations by multiplying packed data elements of the sources of the first type by sub-elements of the scalar value, and adding results of these multiplications to an initial value in a first iteration and a result from a previous iteration in subsequent iterations.

20.

发明申请
SYSTEMS, APPARATUSES, AND METHODS FOR CHAINED FUSED MULTIPLY ADD 有权

公开(公告)号：US20210081198A1

公开(公告)日：2021-03-18

申请号：US17107134

申请日：2020-11-30

Applicant: Intel Corporation

Inventor： JESUS CORBAL , ROBERT VALENTINE , ROMAN S. DUBTSOV , NIKITA A. SHUSTROV , MARK J. CHARNEY , DENNIS R. BRADFORD , MILIND B. GIRKAR , EDWARD T. GROCHOWSKI , THOMAS D. FLETCHER , WARREN E. FERGUSON

IPC: G06F9/30 , G06F7/544 , G06F7/483

Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand. Execution circuitry executes the decoded single instruction to perform iterations of packed fused multiply accumulate operations by multiplying packed data elements of the sources of the first type by sub-elements of the scalar value, and adding results of these multiplications to an initial value in a first iteration and a result from a previous iteration in subsequent iterations.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification