Patent search ap:("INTEL CORPORATION") AND inv:"ELMOUSTAPHA OULD-AHMED-VALL" Page 6

51.

发明申请
MIXED INFERENCE USING LOW AND HIGH PRECISION 审中-公开

公开(公告)号：US20180307495A1

公开(公告)日：2018-10-25

申请号：US15819167

申请日：2017-11-21

Applicant: Intel Corporation

Inventor： ELMOUSTAPHA OULD-AHMED-VALL , BARATH LAKSHMANAN , TATIANA SHPEISMAN , Joydeep Ray , Ping T. Tang , Michael Strickland , Xiaoming Chen , Anbang Yao , Ben J. Ashbaugh , Linda L. Hurd , Liwei Ma

IPC: G06F9/38 , G06N99/00 , G06F13/40 , G06F13/42 , G06F9/30

CPC classification number: G06F9/3887 , G06F9/3001 , G06F9/30014 , G06F9/30036 , G06F9/30094 , G06F9/30109 , G06F9/30112 , G06F9/3016 , G06F9/3802 , G06F9/3836 , G06F9/3851 , G06F9/50 , G06F13/4068 , G06F13/4282 , G06F15/80 , G06F2213/0026 , G06N3/00 , G06N99/005 , G06T1/20

Abstract: One embodiment provides for a graphics processing unit (GPU) to accelerate machine learning operations, the GPU comprising an instruction cache to store a first instruction and a second instruction, the first instruction to cause the GPU to perform a floating-point operation, including a multi-dimensional floating-point operation, and the second instruction to cause the GPU to perform an integer operation; and a general-purpose graphics compute unit having a single instruction, multiple thread (SIMT) architecture, the general-purpose graphics compute unit to simultaneously execute the first instruction and the second instruction, wherein the integer operation corresponds to a memory address calculation.

52.

发明申请
APPARATUS AND METHOD OF IMPROVED INSERT INSTRUCTIONS 审中-公开

公开(公告)号：US20170357510A1

公开(公告)日：2017-12-14

申请号：US15668461

申请日：2017-08-03

Applicant: Intel Corporation

Inventor： ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , JESUS CORBAL SAN ADRIAN , BRET L. TOLL , MARK J. CHARNEY , ZEEV SPERBER , AMIT GRADSTEIN

IPC: G06F9/30

CPC classification number: G06F9/30181 , G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/3013 , G06F9/30167 , G06F9/3802 , G06F12/0615

Abstract: An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group. The apparatus also includes masking layer circuitry to mask the first and third instructions at a first resultant vector granularity, and, mask the second and fourth instructions at a second resultant vector granularity.

53.

发明申请
APPARATUS AND METHOD OF IMPROVED EXTRACT INSTRUCTIONS 审中-公开

公开(公告)号：US20170242704A1

公开(公告)日：2017-08-24

申请号：US15452631

申请日：2017-03-07

Applicant: Intel Corporation

Inventor： ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , JESUS CORBAL , BRET L. TOLL , MARK J. CHARNEY , ZEEV SPERBER , AMIT GRADSTEIN

IPC: G06F9/30

Abstract: An apparatus is described that includes instruction execution circuitry to execute first, second, third, and fourth instructions, the first and second instructions select a first group of input vector elements from one of multiple first non-overlapping sections of respective first and second input vectors. Each of the multiple first non-overlapping sections have a same bit width as the first group. Both the third and fourth instructions select a second group of input vector elements from one of multiple second non-overlapping sections of respective third and fourth input vectors. The second group has a second bit width that is larger than the first bit width. Each of multiple second non-overlapping sections have a same bit width as the second group. The apparatus includes masking layer circuitry to mask the first and second groups at a first granularity a second granularity.

54.

发明申请
INSTRUCTION EXECUTION THAT BROADCASTS AND MASKS DATA VALUES AT DIFFERENT LEVELS OF GRANULARITY 审中-公开

公开(公告)号：US20170169246A1

公开(公告)日：2017-06-15

申请号：US15245113

申请日：2016-08-23

Applicant: Intel Corporation

Inventor： ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , JESUS CORBAL , BRET L. TOLL , MARK J. CHARNEY

IPC: G06F21/62 , G06F9/38 , G06F17/30 , G06F9/30

CPC classification number: G06F21/6227 , G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/30101 , G06F9/3802 , G06F17/30575 , G06F21/6254 , G06F21/70

Abstract: An apparatus is described that includes an execution unit to execute a first instruction and a second instruction. The execution unit includes input register space to store a first data structure to be replicated when executing the first instruction and to store a second data structure to be replicated when executing the second instruction. The first and second data structures are both packed data structures. Data values of the first packed data structure are twice as large as data values of the second packed data structure. The execution unit also includes replication logic circuitry to replicate the first data structure when executing the first instruction to create a first replication data structure, and, to replicate the second data structure when executing the second data instruction to create a second replication data structure. The execution unit also includes masking logic circuitry to mask the first replication data structure at a first granularity and mask the second replication data structure at a second granularity. The second granularity is twice as fine as the first granularity.

55.

发明申请
METHOD AND APPARATUS FOR PERFORMING A VECTOR PERMUTE WITH AN INDEX AND AN IMMEDIATE 审中-公开
Title translation: 用索引和立即执行矢量保护的方法和装置

公开(公告)号：US20160188530A1

公开(公告)日：2016-06-30

申请号：US14583644

申请日：2014-12-27

Applicant: INTEL CORPORATION

Inventor： JESUS CORBAL SAN ADRIAN , ELMOUSTAPHA OULD-AHMED-VALL , ROBERT VALENTINE , MARK J. CHARNEY , MILIND B. GIRKAR , BRET L. TOLL , ROGER ESPASA , GUILLEM SOLE , JAIRO BALART , BRIAN HICKMAN

IPC: G06F15/80 , G06F9/30

CPC classification number: G06F9/30036 , G06F7/764 , G06F9/30032 , G06F15/8053 , G06F15/8084 , G06F16/9017 , G06F2209/462

Abstract: An apparatus and method for performing a vector permute. For example, one embodiment of a processor comprises: a source vector register to store a plurality of source data elements; a destination vector register to store a plurality of destination data elements; a control vector register to store a plurality of control data elements, each control data element corresponding to one of the destination data elements and including an N bit value indicating whether a source data element is to be copied to the corresponding destination data element; vector permute logic to compare the N bit value of each control data element to an N bit portion of an immediate to determine whether to copy a source data element to the corresponding destination data element, wherein if the N bit values match, then the vector permute logic is to identify a source data element using an index value included in the control data element and to responsively copy the source data element to the corresponding destination data element in the destination vector register.

Abstract translation: 用于执行向量置换的装置和方法。例如，处理器的一个实施例包括：源向量寄存器，用于存储多个源数据元素; 目的地向量寄存器，用于存储多个目的地数据元素; 用于存储多个控制数据元素的控制向量寄存器，与目的地数据元素之一对应的每个控制数据元素，并且包括指示源数据元素是否被复制到对应的目的地数据元素的N位值; 向量置换逻辑，以将每个控制数据元素的N位值与立即数的N位部分进行比较，以确定是否将源数据元素复制到对应的目标数据元素，其中如果N位值匹配，则向量置换逻辑是使用包括在控制数据元素中的索引值来识别源数据元素，并且将源数据元素响应地复制到目的地向量寄存器中的相应目的地数据元素。

56.

发明申请
METHOD AND APPARATUS FOR VARIABLY EXPANDING BETWEEN MASK AND VECTOR REGISTERS 审中-公开
Title translation: 方法和装置在掩蔽和矢量寄存器之间进行可变扩展

公开(公告)号：US20160179520A1

公开(公告)日：2016-06-23

申请号：US14581435

申请日：2014-12-23

Applicant: INTEL CORPORATION

Inventor： ASHISH JHA , ROBERT VALENTINE , ELMOUSTAPHA OULD-AHMED-VALL

IPC: G06F9/30

CPC classification number: G06F9/30018 , G06F9/30032 , G06F9/30036 , G06F9/30072

Abstract: An apparatus and method for performing a variable mask-vector expand. For example, one embodiment of a processor comprises: a source mask register to store a plurality of mask bit values; an index register to store a plurality of index values each associated with a vector data element in a destination vector register and identifying a bit within the source mask register; and variable mask-vector expand logic to expand each of the mask bit values from the source mask register into the associated vector data elements using the index values from the index register, wherein all bits of a vector data element are to be set equal to the mask bit value identified by the index value associated with that vector data element.

Abstract translation: 一种用于执行可变掩码向量展开的装置和方法。例如，处理器的一个实施例包括：源掩码寄存器，用于存储多个掩码位值; 索引寄存器，用于存储与目标向量寄存器中的向量数据元素相关联的多个索引值，并且识别源掩码寄存器内的位; 以及可变掩码向量扩展逻辑，以使用来自索引寄存器的索引值将源掩码寄存器中的每个掩码位值扩展到相关联的向量数据元素中，其中向量数据元素的所有位将被设置为等于由与该向量数据元素相关联的索引值识别的掩码位值。

57.

发明申请
PROCESSOR TO PERFORM A BIT RANGE ISOLATION INSTRUCTION 审中-公开
Title translation: 处理器执行一个位格式隔离指令

公开(公告)号：US20150100760A1

公开(公告)日：2015-04-09

申请号：US14568725

申请日：2014-12-12

Applicant: Intel Corporation

Inventor： Maxim Loktyukhin , Eric W Mahurin , Bret L Toll , Martin G Dixon , Sean P Mirkes , David L Kreitzer , ELMOUSTAPHA OULD-AHMED-VALL , Vinodh Gopal

IPC: G06F9/30 , G06F9/38

Abstract: Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed.

Abstract translation: 接收指示源操作数和目标操作数的指令。将结果存储在目标操作数中以响应指令。结果操作数可以具有：（1）具有第一端的第一范围，其中每个位在相应位置中的每个位与源操作数的位相同的指令明确地指定; 和（2）与相应位置中的源操作数的位的值无关的所有位都具有相同值的第二范围。不管移动第一范围的结果相对于源操作数的相应位置中相同值的位，执行指令都可以完成，而不考虑结果中第一个位的位置。还公开了执行这些指令的执行单元，具有执行这种指令的处理器的计算机系统以及存储这种指令的机器可读介质。

58.

发明申请
GRAPHICS PROCESSOR OPERATION SCHEDULING FOR DETERMINISTIC LATENCY 有权

公开(公告)号：US20250028675A1

公开(公告)日：2025-01-23

申请号：US18791963

申请日：2024-08-01

Applicant: Intel Corporation

Inventor： JOYDEEP RAY , SELVAKUMAR PANNEER , SAURABH TANGRI , BEN ASHBAUGH , SCOTT JANUS , ABHISHEK APPU , VARGHESE GEORGE , RAVISHANKAR IYER , NILESH JAIN , PATTABHIRAMAN K , ALTUG KOKER , MIKE MACPHERSON , JOSH MASTRONARDE , ELMOUSTAPHA OULD-AHMED-VALL , JAYAKRISHNA P. S , ERIC SAMSON

IPC: G06F15/78 , G06F7/544 , G06F7/575 , G06F7/58 , G06F9/30 , G06F9/38 , G06F9/50 , G06F12/02 , G06F12/06 , G06F12/0802 , G06F12/0804 , G06F12/0811 , G06F12/0862 , G06F12/0866 , G06F12/0871 , G06F12/0875 , G06F12/0882 , G06F12/0888 , G06F12/0891 , G06F12/0893 , G06F12/0895 , G06F12/0897 , G06F12/1009 , G06F12/128 , G06F15/80 , G06F17/16 , G06F17/18 , G06N3/08 , G06T1/20 , G06T1/60 , G06T15/06 , H03M7/46

Abstract: Embodiments described herein include software, firmware, and hardware that provides techniques to enable deterministic scheduling across multiple general-purpose graphics processing units. One embodiment provides a multi-GPU architecture with uniform latency. One embodiment provides techniques to distribute memory output based on memory chip thermals. One embodiment provides techniques to enable thermally aware workload scheduling. One embodiment provides techniques to enable end to end contracts for workload scheduling on multiple GPUs.

59.

发明申请
MIXED INFERENCE USING LOW AND HIGH PRECISION 有权

公开(公告)号：US20220382555A1

公开(公告)日：2022-12-01

申请号：US17839856

申请日：2022-06-14

Applicant: Intel Corporation

Inventor： ELMOUSTAPHA OULD-AHMED-VALL , BARATH LAKSHMANAN , TATIANA SHPEISMAN , Joydeep Ray , Ping T. Tang , Michael Strickland , Xiaoming Chen , Anbang Yao , Ben J. Ashbaugh , Linda L. Hurd , Liwei Ma

IPC: G06F9/38 , G06F9/30 , G06F13/42 , G06F13/40 , G06N20/00 , G06T1/20 , G06N3/04 , G06N3/063 , G06N3/08 , G06N20/10 , G06F9/50 , G06F15/80 , G06N3/00

Abstract: One embodiment provides for a graphics processing unit (GPU) to accelerate machine learning operations, the GPU comprising an instruction cache to store a first instruction and a second instruction, the first instruction to cause the GPU to perform a floating-point operation, including a multi-dimensional floating-point operation, and the second instruction to cause the GPU to perform an integer operation; and a general-purpose graphics compute unit having a single instruction, multiple thread architecture, the general-purpose graphics compute unit to concurrently execute the first instruction and the second instruction.

60.

发明申请
APPARATUS AND METHOD FOR SCALING PRE-SCALED RESULTS OF COMPLEX MUTIPLY-ACCUMULATE OPERATIONS ON PACKED REAL AND IMAGINARY DATA ELEMENTS 有权

公开(公告)号：US20220326946A1

公开(公告)日：2022-10-13

申请号：US17589428

申请日：2022-01-31

Applicant: Intel Corporation

Inventor： VENKATESWARA MADDURI , ELMOUSTAPHA OULD-AHMED-VALL , MARK CHARNEY , ROBERT VALENTINE , JESUS CORBAL , BINWEI YANG

IPC: G06F9/30 , G06F7/544 , G06F17/14 , G06F7/48

Abstract: An apparatus and method for performing a transform on complex data. For example, one embodiment of a processor comprises: multiplier circuitry to multiply packed real N-bit data elements in the first source register with packed real M-bit data elements in the second source register and to multiply packed imaginary N-bit data elements in the first source register with packed imaginary M-bit data elements in the second source register to generate at least four real products, adder circuitry to subtract a first selected real product from a second selected real product to generate a first temporary result and to subtract a third selected real product from a fourth selected real product to generate a second temporary result, the adder circuitry to add the first temporary result to a first packed N-bit data element from the third source register to generate a first pre-scaled result, to subtract the first temporary result from the first packed N-bit data element to generate a second pre-scaled result, to add the second temporary result to a second packed N-bit data element from the third source register to generate a third pre-scaled result, and to subtract the second temporary result from the second packed N-bit data element to generate a fourth pre-scaled result; scaling circuitry to scale the first, second, third and fourth pre-scaled results to a specified bit width to generate first, second, third, and fourth final results; and a destination register to store the first, second, third, and fourth final results in specified data element positions.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification