-
公开(公告)号:EP3396622A1
公开(公告)日:2018-10-31
申请号:EP18159838.4
申请日:2018-03-02
申请人: INTEL Corporation
发明人: APPU, Abhishek R. , KOKER, Altug , WEAST, John C. , MACPHERSON, Mike B. , HURD, Linda L. , BAGHSORKHI, Sara S. , GOTTSCHLICH, Justin E. , SURTI, Prasoonkumar , SAKTHIVEL, Chandrasekaran , MA, Liwei , OULD-AHMED-VALL, Elmoustapha , SINHA, Kamal , RAY, Joydeep , VEMBU, Balaji , JAHAGIRDAR, Sanjeev , RANGANATHAN, Vasanth , KIM, Dukhwan
IPC分类号: G06T1/20
摘要: A mechanism is described for facilitating inference coordination and processing utilization for machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting, at training time, information relating to one or more tasks to be performed according to a training dataset relating to a processor including a graphics processor. The method may further include analyzing the information to determine one or more portions of hardware relating to the processor capable of supporting the one or more tasks, and configuring the hardware to pre-select the one or more portions to perform the one or more tasks, while other portions of the hardware remain available for other tasks.
-
2.
公开(公告)号:EP3396533A2
公开(公告)日:2018-10-31
申请号:EP18162635.9
申请日:2018-03-19
申请人: INTEL Corporation
发明人: NURVITADHI, Eriko , VEMBU, Balaji , GALOPPO VON BORRIES, Nicolas C. , BARIK, Rajkishore , LIN, Tsung-Han , SINHA, Kamal , SATISH, Nadathur Rajagopalan , BOTTLESON, Jeremy , AKHBARI, Farshad , KOKER, Altug , SRINIVASA, Narayan , KIM, Dukhwan , BAGHSORKHI, Sara S. , GOTTSCHLICH, Justin E. , CHEN, Feng , OULD-AHMED-VALL, Elmoustapha , NEALIS, Kevin , CHEN, Xiaoming , YAO, Anbang
CPC分类号: G06T1/20 , G06F9/3001 , G06F9/3017 , G06F9/3851 , G06F9/3887 , G06F9/3895 , G06N3/0445 , G06N3/0454 , G06N3/063 , G06N3/084
摘要: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation.
-
公开(公告)号:EP4163786A1
公开(公告)日:2023-04-12
申请号:EP22210292.3
申请日:2018-03-02
申请人: INTEL Corporation
发明人: BARIK, Rajkishore , OULD-AHMED-VALL, Elmoustapha , CHEN, Xiaoming , SRIVASTAVA, Dhawal , YAO, Anbang , NEALIS, Kevin , NURVITADHI, Eriko , BAGHSORKHI, Sara S. , VEMBU, Balaji , SHPEISMAN, Tatiana , TANG, Ping T.
摘要: The present disclosure provides a method and a graphics processor comprising an instruction cache to store an instruction; a scheduler to schedule a plurality of threads for execution of the instruction; and multiple compute blocks configured to perform multiply-accumulate operations in response to execution of the instruction, including matrix multiplication logic configured to execute the instruction via the plurality of threads. The matrix multiplication logic includes a plurality of functional units configured to process, in parallel via the plurality of threads, a corresponding plurality of matrix elements to multiply a first matrix, a, and a second matrix, b, wherein multiplying the first matrix, a, and the second matrix, b, includes to multiply data elements in a row of the first matrix, a, by corresponding data elements in a column of the second matrix, b, to generate a plurality of products.
-
公开(公告)号:EP4036715A1
公开(公告)日:2022-08-03
申请号:EP22163309.2
申请日:2018-03-02
申请人: Intel Corporation
发明人: BARIK, Rajkishore , OULD-AHMED-VALL, Elmoustapha , CHEN, Xiaoming , SRIVASTAVA, Dhawal , YAO, Anbang , NEALIS, Kevin , NURVITADHI, Eriko , BAGHSORKHI, Sara S. , VEMBU, Balaji , SHPEISMAN, Tatiana , TANG, Ping T.
摘要: The present disclosure provides an apparatus to accelerate machine-learning operations. The apparatus comprises a cache memory to store a plurality of instructions, a machine-learning scheduler unit to schedule the plurality of instructions, a machine-learning instruction fetch and decode unit to decode the plurality of instructions to determine a set of operations to perform, multiple compute blocks to perform parallel multiply-accumulate operations based on the machine-learning instruction fetch and decode unit decoding one instruction of the plurality of instructions and fixed function matrix multiplication logic to perform matrix multiplication operations based on the instruction fetch and decode unit decoding another instruction of the plurality of instructions.
-
公开(公告)号:EP3238091B1
公开(公告)日:2020-05-13
申请号:EP15873914.4
申请日:2015-11-16
申请人: Intel Corporation
发明人: WANG, Cheng , HARTONO, Albert , BAGHSORKHI, Sara S. , WU, Youfeng
-
公开(公告)号:EP3238091A1
公开(公告)日:2017-11-01
申请号:EP15873914.4
申请日:2015-11-16
申请人: Intel Corporation
发明人: WANG, Cheng , HARTONO, Albert , BAGHSORKHI, Sara S. , WU, Youfeng
IPC分类号: G06F15/80
CPC分类号: G06F9/3834 , G06F9/30021 , G06F9/30036 , G06F9/3838
摘要: In one embodiment vector conflict detection instructions are disclosed to perform dynamic memory conflict detection within a vectorized iterative scalar operation. The instructions may be performed by a vector processor to generate a partition vector identifying groups of conflict free iterations. The partition vector may be used to generate a write mask for subsequent vector operations.
摘要翻译: 在一个实施例中,公开了矢量冲突检测指令以执行矢量化迭代标量操作内的动态存储器冲突检测。 指令可以由矢量处理器执行以生成识别无冲突迭代组的分区矢量。 分区矢量可以用于为随后的矢量操作生成写掩码。
-
公开(公告)号:EP4141674A1
公开(公告)日:2023-03-01
申请号:EP22197260.7
申请日:2018-03-26
申请人: INTEL Corporation
发明人: OULD-AHMED-VALL, ElMoustapha , BAGHSORKHI, Sara S. , YAO, Anbang , NEALIS, Kevin , CHEN, Xiaoming , KOKER, Altug , APPU, Abhishek R. , WEAST, John C. , MACPHERSON, Mike B. , KIM, Dukhwan , HURD, Linda L. , ASHBAUGH, Ben J. , LAKSHMANAN, Barath , MA, Liwei , RAY, Joydeep , TANG, Ping T. , STRICKLAND, Michael S.
IPC分类号: G06F9/50 , G06T15/00 , G06F9/30 , G06F9/38 , G06N3/04 , G06N3/063 , G06N3/08 , G06T1/20 , G06F12/0811 , G06N3/084 , G06N3/044 , G06N3/045
摘要: The present disclosure provides a method and a graphics processing unit comprising a memory including plurality of memory devices; compression logic to compress data to be written to the memory; and a streaming multiprocessor coupled with the memory. The streaming multiprocessor to concurrently execute multiple thread groups, wherein the streaming multiprocessor includes a single instruction, multiple thread, SIMT, architecture and the streaming multiprocessor is to execute multiple threads for multiple instructions. The multiple instructions include a first instruction to cause a first portion of the streaming multiprocessor to perform a floating-point operation on multiple floating-point input operands and a second instruction to cause a second portion of the streaming multiprocessor to perform an integer operation on multiple integer operands, the first instruction to execute concurrently with the second instruction.
-
公开(公告)号:EP3792761A1
公开(公告)日:2021-03-17
申请号:EP20205451.6
申请日:2018-03-26
申请人: INTEL Corporation
发明人: OULD-AHMED-VALL, ElMoustapha , BAGHSORKHI, Sara S. , YAO, Anbang , NEALIS, Kevin , CHEN, Xiaoming , KOKER, Altug , APPU, Abhishek R. , WEAST, John C. , MACPHERSON, Mike B. , KIM, Dukhwan , HURD, Linda L. , ASHBAUGH, Ben J. , LAKSHMANAN, Barath , MA, Liwei , RAY, Joydeep , TANG, Ping T. , STRICKLAND, Michael S.
摘要: The present disclosure provides an interconnect fabric comprising one or more switches, a memory interface coupled to the interconnect fabric, an input/output (IO) interface coupled to the interconnect fabric and an array of processing clusters coupled to the interconnect fabric. The array of multiprocessors is to process mixed-precision instructions. At least one processing cluster comprises a plurality of registers to store a plurality of packed data elements at a first precision and an execution unit to execute mixed-precision dot-product instructions. The execution unit is to perform a plurality of multiplications of different pairs of the plurality of packed data elements to generate a corresponding plurality of products and to add the corresponding plurality of products to an accumulation value stored at a second precision greater than the first precision.
-
公开(公告)号:EP3594813A1
公开(公告)日:2020-01-15
申请号:EP19182892.0
申请日:2018-03-26
申请人: Intel Corporation
发明人: OULD-AHMED-VALL, ElMoustapha , BAGHSORKHI, Sara S. , YAO, Anbang , NEALIS, Kevin , CHEN, Xiaoming , KOKER, Altug , APPU, Abhishek R. , WEAST, John C. , MACPHERSON, Mike B. , KIM, Dukhwan , HURD, Linda L. , ASHBAUGH, Ben J. , LAKSHMANAN, Barath , MA, Liwei , RAY, Joydeep , TANG, Ping T. , STRICKLAND, Michael S.
摘要: An accelerator on a multi-chip module, a method of accelerating a machine-learning operation and a data processing system are provided. In one embodiment, the accelerator comprises: a memory stack including multiple memory dies; and a graphics processing unit (GPU) coupled with the memory stack via one or more memory controllers. The GPU includes a plurality of multiprocessors having a single instruction, multiple thread (SIMT) architecture, the multiprocessors to execute at least one single instruction, the at least one single instruction to accelerate a linear algebra subprogram associated with a machine learning framework. The at least one single instruction to cause at least a portion of the GPU to perform a floating-point operation on input having differing precisions, the floating-point operation a two-dimensional matrix multiply and accumulate operation. At least a portion of the plurality of multiprocessors include a mixed precision core, the mixed precision core to execute a thread of the at least one single instruction, the mixed precision core including a floating-point unit to perform a first operation of the thread at a first precision and a second operation of the thread at a second precision. The first operation is a multiply having at least one 16-bit floating-point input and the second operation is an accumulate having a 32-bit floating-point input.
-
公开(公告)号:EP3396534A3
公开(公告)日:2019-01-23
申请号:EP18159835.0
申请日:2018-03-02
申请人: INTEL Corporation
发明人: BARIK, Rajkishore , OULD-AHMED-VALL, Elmoustapha , CHEN, Xiaoming , SRIVASTAVA, Dhawal , YAO, Anbang , NEALIS, Kevin , NURVITADHI, Eriko , BAGHSORKHI, Sara S. , VEMBU, Balaji , SHPEISMAN, Tatiana , TANG, Ping T.
摘要: One embodiment provides for a compute apparatus to perform machine learning operations, the apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to perform one or more machine learning operations, wherein the decode unit, based on parameters of the one or more machine learning operations, is to request a scheduler to schedule the one or more machine learning operations to one of an array of programmable compute units and a fixed function compute unit.
-
-
-
-
-
-
-
-
-