专利检索 ap:("INTEL Corporation") AND inv:"BAGHSORKHI, Sara S." 第 1 页

1.

发明公开
COORDINATION AND INCREASED UTILIZATION OF GRAPHICS PROCESSORS DURING INFERENCE 审中-公开

公开(公告)号：EP3396622A1

公开(公告)日：2018-10-31

申请号：EP18159838.4

申请日：2018-03-02

申请人： INTEL Corporation

发明人： APPU, Abhishek R. , KOKER, Altug , WEAST, John C. , MACPHERSON, Mike B. , HURD, Linda L. , BAGHSORKHI, Sara S. , GOTTSCHLICH, Justin E. , SURTI, Prasoonkumar , SAKTHIVEL, Chandrasekaran , MA, Liwei , OULD-AHMED-VALL, Elmoustapha , SINHA, Kamal , RAY, Joydeep , VEMBU, Balaji , JAHAGIRDAR, Sanjeev , RANGANATHAN, Vasanth , KIM, Dukhwan

IPC分类号： G06T1/20

CPC分类号： G06T1/20 , G06N3/08

摘要： A mechanism is described for facilitating inference coordination and processing utilization for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting, at training time, information relating to one or more tasks to be performed according to a training dataset relating to a processor including a graphics processor. The method may further include analyzing the information to determine one or more portions of hardware relating to the processor capable of supporting the one or more tasks, and configuring the hardware to pre-select the one or more portions to perform the one or more tasks, while other portions of the hardware remain available for other tasks.

2.

发明公开
PROGRAMMABLE COARSE GRAINED AND SPARSE MATRIX COMPUTE HARDWARE WITH ADVANCED SCHEDULING 审中-公开

公开(公告)号：EP3396533A2

公开(公告)日：2018-10-31

申请号：EP18162635.9

申请日：2018-03-19

申请人： INTEL Corporation

发明人： NURVITADHI, Eriko , VEMBU, Balaji , GALOPPO VON BORRIES, Nicolas C. , BARIK, Rajkishore , LIN, Tsung-Han , SINHA, Kamal , SATISH, Nadathur Rajagopalan , BOTTLESON, Jeremy , AKHBARI, Farshad , KOKER, Altug , SRINIVASA, Narayan , KIM, Dukhwan , BAGHSORKHI, Sara S. , GOTTSCHLICH, Justin E. , CHEN, Feng , OULD-AHMED-VALL, Elmoustapha , NEALIS, Kevin , CHEN, Xiaoming , YAO, Anbang

IPC分类号： G06F9/30 , G06F9/38

CPC分类号： G06T1/20 , G06F9/3001 , G06F9/3017 , G06F9/3851 , G06F9/3887 , G06F9/3895 , G06N3/0445 , G06N3/0454 , G06N3/063 , G06N3/084

摘要： One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation.

3.

发明公开
SPECIALIZED FIXED FUNCTION HARDWARE FOR EFFICIENT CONVOLUTION 审中-公开

公开(公告)号：EP4163786A1

公开(公告)日：2023-04-12

申请号：EP22210292.3

申请日：2018-03-02

申请人： INTEL Corporation

发明人： BARIK, Rajkishore , OULD-AHMED-VALL, Elmoustapha , CHEN, Xiaoming , SRIVASTAVA, Dhawal , YAO, Anbang , NEALIS, Kevin , NURVITADHI, Eriko , BAGHSORKHI, Sara S. , VEMBU, Balaji , SHPEISMAN, Tatiana , TANG, Ping T.

IPC分类号： G06F9/38 , G06F9/30

摘要： The present disclosure provides a method and a graphics processor comprising an instruction cache to store an instruction; a scheduler to schedule a plurality of threads for execution of the instruction; and multiple compute blocks configured to perform multiply-accumulate operations in response to execution of the instruction, including matrix multiplication logic configured to execute the instruction via the plurality of threads. The matrix multiplication logic includes a plurality of functional units configured to process, in parallel via the plurality of threads, a corresponding plurality of matrix elements to multiply a first matrix, a, and a second matrix, b, wherein multiplying the first matrix, a, and the second matrix, b, includes to multiply data elements in a row of the first matrix, a, by corresponding data elements in a column of the second matrix, b, to generate a plurality of products.

4.

发明公开
SPECIALIZED FIXED FUNCTION HARDWARE FOR EFFICIENT CONVOLUTION 审中-公开

公开(公告)号：EP4036715A1

公开(公告)日：2022-08-03

申请号：EP22163309.2

申请日：2018-03-02

申请人： Intel Corporation

发明人： BARIK, Rajkishore , OULD-AHMED-VALL, Elmoustapha , CHEN, Xiaoming , SRIVASTAVA, Dhawal , YAO, Anbang , NEALIS, Kevin , NURVITADHI, Eriko , BAGHSORKHI, Sara S. , VEMBU, Balaji , SHPEISMAN, Tatiana , TANG, Ping T.

IPC分类号： G06F9/38 , G06F9/30

摘要： The present disclosure provides an apparatus to accelerate machine-learning operations. The apparatus comprises a cache memory to store a plurality of instructions, a machine-learning scheduler unit to schedule the plurality of instructions, a machine-learning instruction fetch and decode unit to decode the plurality of instructions to determine a set of operations to perform, multiple compute blocks to perform parallel multiply-accumulate operations based on the machine-learning instruction fetch and decode unit decoding one instruction of the plurality of instructions and fixed function matrix multiplication logic to perform matrix multiplication operations based on the instruction fetch and decode unit decoding another instruction of the plurality of instructions.

5.

发明授权
FAST VECTOR DYNAMIC MEMORY CONFLICT DETECTION 有权

公开(公告)号：EP3238091B1

公开(公告)日：2020-05-13

申请号：EP15873914.4

申请日：2015-11-16

申请人： Intel Corporation

发明人： WANG, Cheng , HARTONO, Albert , BAGHSORKHI, Sara S. , WU, Youfeng

IPC分类号： G06F9/30 , G06F9/38

6.

发明公开
FAST VECTOR DYNAMIC MEMORY CONFLICT DETECTION 审中-公开
标题翻译：快速向量动态存储器冲突检测

公开(公告)号：EP3238091A1

公开(公告)日：2017-11-01

申请号：EP15873914.4

申请日：2015-11-16

申请人： Intel Corporation

发明人： WANG, Cheng , HARTONO, Albert , BAGHSORKHI, Sara S. , WU, Youfeng

IPC分类号： G06F15/80

CPC分类号： G06F9/3834 , G06F9/30021 , G06F9/30036 , G06F9/3838

摘要： In one embodiment vector conflict detection instructions are disclosed to perform dynamic memory conflict detection within a vectorized iterative scalar operation. The instructions may be performed by a vector processor to generate a partition vector identifying groups of conflict free iterations. The partition vector may be used to generate a write mask for subsequent vector operations.

摘要翻译： 在一个实施例中，公开了矢量冲突检测指令以执行矢量化迭代标量操作内的动态存储器冲突检测。指令可以由矢量处理器执行以生成识别无冲突迭代组的分区矢量。分区矢量可以用于为随后的矢量操作生成写掩码。

7.

发明公开
COMPUTE OPTIMIZATIONS FOR LOW PRECISION MACHINE LEARNING OPERATIONS 审中-公开

公开(公告)号：EP4141674A1

公开(公告)日：2023-03-01

申请号：EP22197260.7

申请日：2018-03-26

申请人： INTEL Corporation

发明人： OULD-AHMED-VALL, ElMoustapha , BAGHSORKHI, Sara S. , YAO, Anbang , NEALIS, Kevin , CHEN, Xiaoming , KOKER, Altug , APPU, Abhishek R. , WEAST, John C. , MACPHERSON, Mike B. , KIM, Dukhwan , HURD, Linda L. , ASHBAUGH, Ben J. , LAKSHMANAN, Barath , MA, Liwei , RAY, Joydeep , TANG, Ping T. , STRICKLAND, Michael S.

IPC分类号： G06F9/50 , G06T15/00 , G06F9/30 , G06F9/38 , G06N3/04 , G06N3/063 , G06N3/08 , G06T1/20 , G06F12/0811 , G06N3/084 , G06N3/044 , G06N3/045

摘要： The present disclosure provides a method and a graphics processing unit comprising a memory including plurality of memory devices; compression logic to compress data to be written to the memory; and a streaming multiprocessor coupled with the memory. The streaming multiprocessor to concurrently execute multiple thread groups, wherein the streaming multiprocessor includes a single instruction, multiple thread, SIMT, architecture and the streaming multiprocessor is to execute multiple threads for multiple instructions. The multiple instructions include a first instruction to cause a first portion of the streaming multiprocessor to perform a floating-point operation on multiple floating-point input operands and a second instruction to cause a second portion of the streaming multiprocessor to perform an integer operation on multiple integer operands, the first instruction to execute concurrently with the second instruction.

8.

发明公开
COMPUTE OPTIMIZATIONS FOR LOW PRECISION MACHINE LEARNING OPERATIONS 审中-公开

公开(公告)号：EP3792761A1

公开(公告)日：2021-03-17

申请号：EP20205451.6

申请日：2018-03-26

申请人： INTEL Corporation

发明人： OULD-AHMED-VALL, ElMoustapha , BAGHSORKHI, Sara S. , YAO, Anbang , NEALIS, Kevin , CHEN, Xiaoming , KOKER, Altug , APPU, Abhishek R. , WEAST, John C. , MACPHERSON, Mike B. , KIM, Dukhwan , HURD, Linda L. , ASHBAUGH, Ben J. , LAKSHMANAN, Barath , MA, Liwei , RAY, Joydeep , TANG, Ping T. , STRICKLAND, Michael S.

IPC分类号： G06F9/50 , G06T15/00 , G06F9/30 , G06F9/38 , G06N3/04 , G06N3/063 , G06N3/08 , G06T1/20

摘要： The present disclosure provides an interconnect fabric comprising one or more switches, a memory interface coupled to the interconnect fabric, an input/output (IO) interface coupled to the interconnect fabric and an array of processing clusters coupled to the interconnect fabric. The array of multiprocessors is to process mixed-precision instructions. At least one processing cluster comprises a plurality of registers to store a plurality of packed data elements at a first precision and an execution unit to execute mixed-precision dot-product instructions. The execution unit is to perform a plurality of multiplications of different pairs of the plurality of packed data elements to generate a corresponding plurality of products and to add the corresponding plurality of products to an accumulation value stored at a second precision greater than the first precision.

9.

发明公开
COMPUTE OPTIMIZATIONS FOR LOW PRECISION MACHINE LEARNING OPERATIONS 审中-公开

公开(公告)号：EP3594813A1

公开(公告)日：2020-01-15

申请号：EP19182892.0

申请日：2018-03-26

申请人： Intel Corporation

发明人： OULD-AHMED-VALL, ElMoustapha , BAGHSORKHI, Sara S. , YAO, Anbang , NEALIS, Kevin , CHEN, Xiaoming , KOKER, Altug , APPU, Abhishek R. , WEAST, John C. , MACPHERSON, Mike B. , KIM, Dukhwan , HURD, Linda L. , ASHBAUGH, Ben J. , LAKSHMANAN, Barath , MA, Liwei , RAY, Joydeep , TANG, Ping T. , STRICKLAND, Michael S.

IPC分类号： G06F9/50 , G06T15/00 , G06F9/30 , G06F9/38 , G06N3/04 , G06N3/063 , G06N3/08 , G06T1/20

摘要： An accelerator on a multi-chip module, a method of accelerating a machine-learning operation and a data processing system are provided. In one embodiment, the accelerator comprises: a memory stack including multiple memory dies; and a graphics processing unit (GPU) coupled with the memory stack via one or more memory controllers. The GPU includes a plurality of multiprocessors having a single instruction, multiple thread (SIMT) architecture, the multiprocessors to execute at least one single instruction, the at least one single instruction to accelerate a linear algebra subprogram associated with a machine learning framework. The at least one single instruction to cause at least a portion of the GPU to perform a floating-point operation on input having differing precisions, the floating-point operation a two-dimensional matrix multiply and accumulate operation. At least a portion of the plurality of multiprocessors include a mixed precision core, the mixed precision core to execute a thread of the at least one single instruction, the mixed precision core including a floating-point unit to perform a first operation of the thread at a first precision and a second operation of the thread at a second precision. The first operation is a multiply having at least one 16-bit floating-point input and the second operation is an accumulate having a 32-bit floating-point input.

10.

发明公开
SPECIALIZED FIXED FUNCTION HARDWARE FOR EFFICIENT CONVOLUTION 审中-公开

公开(公告)号：EP3396534A3

公开(公告)日：2019-01-23

申请号：EP18159835.0

申请日：2018-03-02

申请人： INTEL Corporation

发明人： BARIK, Rajkishore , OULD-AHMED-VALL, Elmoustapha , CHEN, Xiaoming , SRIVASTAVA, Dhawal , YAO, Anbang , NEALIS, Kevin , NURVITADHI, Eriko , BAGHSORKHI, Sara S. , VEMBU, Balaji , SHPEISMAN, Tatiana , TANG, Ping T.

IPC分类号： G06F9/38 , G06F9/30

摘要： One embodiment provides for a compute apparatus to perform machine learning operations, the apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to perform one or more machine learning operations, wherein the decode unit, based on parameters of the one or more machine learning operations, is to request a scheduler to schedule the one or more machine learning operations to one of an array of programmable compute units and a fixed function compute unit.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类