专利检索 ap:("Intel Corporation") AND inv:"Tatiana Shpeisman" 第 7 页

61.

发明授权
Function callback mechanism between a central processing unit (CPU) and an auxiliary processor 有权
标题翻译：中央处理单元（CPU）和辅助处理器之间的功能回调机制

公开(公告)号：US09342384B1

公开(公告)日：2016-05-17

申请号：US14574545

申请日：2014-12-18

申请人： Intel Corporation

发明人： Brian T. Lewis , Rajkishore Barik , Tatiana Shpeisman

IPC分类号： G06F13/00 , G06F9/54 , G06T1/20

CPC分类号： G06F9/544 , G06T1/20

摘要： Generally, this disclosure provides systems, devices, methods and computer readable media for implementing function callback requests between a first processor (e.g., a GPU) and a second processor (e.g., a CPU). The system may include a shared virtual memory (SVM) coupled to the first and second processors, the SVM configured to store at least one double-ended queue (Deque). An execution unit (EU) of the first processor may be associated with a first of the Deques and configured to push the callback requests to that first Deque. A request handler thread executing on the second processor may be configured to: pop one of the callback requests from the first Deque; execute a function specified by the popped callback request; and generate a completion signal to the EU in response to completion of the function.

摘要翻译： 通常，本公开提供了用于在第一处理器（例如，GPU）和第二处理器（例如，CPU）之间实现功能回调请求的系统，设备，方法和计算机可读介质。该系统可以包括耦合到第一和第二处理器的共享虚拟存储器（SVM），所述SVM被配置为存储至少一个双端队列（Deque）。第一处理器的执行单元（EU）可以与第一个Deques相关联，并被配置为将回调请求推送到第一个Deque。在第二处理器上执行的请求处理程序线程可以被配置为：从第一Deque弹出一个回调请求; 执行弹出的回调请求指定的功能; 并响应功能的完成向EU产生完成信号。

62.

发明公开
INSTRUCTIONS AND LOGIC TO PERFORM FLOATING POINT AND INTEGER OPERATIONS FOR MACHINE LEARNING 审中-公开

公开(公告)号：US20240184572A1

公开(公告)日：2024-06-06

申请号：US18528340

申请日：2023-12-04

申请人： Intel Corporation

发明人： Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar

IPC分类号： G06F9/30 , G06F7/483 , G06F7/544 , G06F9/38 , G06F17/16 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G06N20/00 , G06T15/00 , G09G5/393

CPC分类号： G06F9/3001 , G06F7/483 , G06F7/5443 , G06F9/30014 , G06F9/30036 , G06F9/3851 , G06N3/044 , G06N3/045 , G06N3/063 , G06N3/08 , G09G5/393 , G06F9/30025 , G06F9/3013 , G06F17/16 , G06F2207/3824 , G06N20/00 , G06T15/005

摘要： One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.

63.

发明公开
DYNAMIC PRECISION FOR NEURAL NETWORK COMPUTE OPERATIONS 审中-公开

公开(公告)号：US20240005136A1

公开(公告)日：2024-01-04

申请号：US18351124

申请日：2023-07-12

申请人： Intel Corporation

发明人： Kamal Sinha , Balaji Vembu , Eriko Nurvitadhi , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Farshad Akhbari , Narayan Srinivasa , Feng Chen , Dukhwan Kim , Nadathur Rajagopalan Satish , John C. Weast , Mike B. MacPherson , Linda L. Hurd , Vasanth Ranganathan , Sanjeev Jahagirdar

IPC分类号： G06N3/063 , G06N3/08 , G06N3/04 , G06T1/20 , G06F9/30 , G06T15/00 , G06F15/78 , G06F15/76 , G06F1/3287 , G06F1/3293 , G06N3/084 , G06N3/044 , G06N3/045

CPC分类号： G06N3/063 , G06N3/08 , G06N3/04 , G06T1/20 , G06F9/30014 , G06T15/005 , G06F15/78 , G06F15/76 , G06F9/30036 , G06F1/3287 , G06F1/3293 , G06N3/084 , G06N3/044 , G06N3/045 , G06T1/60

摘要： In an example, an apparatus comprises a compute engine comprising a high precision component and a low precision component; and logic, at least partially including hardware logic, to receive instructions in the compute engine; select at least one of the high precision component or the low precision component to execute the instructions; and apply a gate to at least one of the high precision component or the low precision component to execute the instructions. Other embodiments are also disclosed and claimed.

64.

发明授权
Convolutional neural network optimization mechanism 有权

公开(公告)号：US11727246B2

公开(公告)日：2023-08-15

申请号：US16283021

申请日：2019-02-22

申请人： Intel Corporation

发明人： Liwei Ma , Elmoustapha Ould-Ahmed-Vall , Barath Lakshmanan , Ben J. Ashbaugh , Jingyi Jin , Jeremy Bottleson , Mike B. Macpherson , Kevin Nealis , Dhawal Srivastava , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Altug Koker , Abhishek R. Appu

IPC分类号： G06N3/04 , G06N3/082 , G06N3/063 , G06T1/20 , G06N3/044 , G06N3/045

CPC分类号： G06N3/04 , G06N3/063 , G06N3/082 , G06T1/20 , G06N3/044 , G06N3/045

摘要： Embodiments provide systems and methods which facilitate optimization of a convolutional neural network (CNN). One embodiment provides for a non-transitory machine-readable medium storing instructions that cause one or more processors to perform operations comprising processing a trained convolutional neural network (CNN) to generate a processed CNN, the trained CNN having weights in a floating-point format. Processing the trained CNN includes quantizing the weights in the floating-point format to generate weights in an integer format. Quantizing the weights includes generating a quantization table to enable non-uniform quantization of the weights and quantizing the weights from the floating-point format to the integer format using the quantization table. The operations additionally comprise performing an inference operation utilizing the processed CNN with the integer format weights.

65.

发明授权
Instructions and logic to perform floating point and integer operations for machine learning 有权

公开(公告)号：US11360767B2

公开(公告)日：2022-06-14

申请号：US17305355

申请日：2021-07-06

申请人： Intel Corporation

发明人： Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar

IPC分类号： G06F9/30 , G09G5/393 , G06F9/38 , G06F7/483 , G06F7/544 , G06N3/04 , G06N3/063 , G06N3/08 , G06T15/00 , G06N20/00 , G06F17/16

摘要： A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.

66.

发明申请
SPECIALIZED FIXED FUNCTION HARDWARE FOR EFFICIENT CONVOLUTION 有权

公开(公告)号：US20220114430A1

公开(公告)日：2022-04-14

申请号：US17558285

申请日：2021-12-21

申请人： Intel Corporation

发明人： Rajkishore Barik , Elmoustapha Ould-Ahmed-Vall , Xiaoming Chen , Dhawal Srivastava , Anbang Yao , Kevin Nealis , Eriko Nurvitadhi , Sara S. Baghsorkhi , Balaji Vembu , Tatiana Shpeisman , Ping T. Tang

IPC分类号： G06N3/063 , G06N3/04 , G06F9/30 , G06F9/38 , G06T1/20 , G06N3/08 , G06F16/17

摘要： One embodiment provides an apparatus comprising an instruction cache to store a plurality of instructions, a scheduler unit coupled to the instruction cache, the scheduler unit to schedule the plurality of instructions for execution, an instruction fetch and decode unit to decode the plurality of instructions to determine a set of operations to perform in response, one or more compute blocks to perform parallel multiply-accumulate operations based on the instruction fetch and decode unit decoding a first instruction of the plurality of instructions, and matrix multiplication logic to perform matrix multiplication operations based on the instruction fetch and decode unit decoding a second instruction of the plurality of instructions.

67.

发明授权
Compute optimization mechanism 有权

公开(公告)号：US11270405B2

公开(公告)日：2022-03-08

申请号：US16983078

申请日：2020-08-03

申请人： Intel Corporation

发明人： Abhishek R. Appu , Altug Koker , Linda L. Hurd , Dukhwan Kim , Mike B. Macpherson , John C. Weast , Feng Chen , Farshad Akhbari , Narayan Srinivasa , Nadathur Rajagopalan Satish , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman

IPC分类号： G06T1/20 , G06F3/14 , G06F9/30 , G06F9/38 , G06N3/04 , G06N3/063 , G06N3/08 , G06T15/00 , G09G5/36 , G06T15/04

摘要： An apparatus to facilitate compute optimization is disclosed. The apparatus includes a mixed precision core to perform a mixed precision multi-dimensional matrix multiply and accumulate operation on 8-bit and/or 32 bit signed or unsigned integer elements.

68.

发明申请
INSTRUCTIONS AND LOGIC TO PERFORM FLOATING POINT AND INTEGER OPERATIONS FOR MACHINE LEARNING 有权

公开(公告)号：US20220019431A1

公开(公告)日：2022-01-20

申请号：US17305355

申请日：2021-07-06

申请人： Intel Corporation

发明人： Himanshu Kaul , Mark A. Anders , Sanu K. Mathew , Anbang Yao , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Kamal Sinha , Balaji Vembu , Nicolas C. Galoppo Von Borries , Eriko Nurvitadhi , Rajkishore Barik , Tsung-Han Lin , Vasanth Ranganathan , Sanjeev Jahagirdar

IPC分类号： G06F9/30 , G06N3/04 , G06F9/38 , G06F7/544 , G06N3/08 , G06N3/063 , G06F7/483 , G09G5/393 , G06T15/00 , G06F17/16 , G06N20/00

摘要： A processing apparatus is provided comprising a multiprocessor having a multithreaded architecture. The multiprocessor can execute at least one single instruction to perform parallel mixed precision matrix operations. In one embodiment the apparatus includes a memory interface and an array of multiprocessors coupled to the memory interface. At least one multiprocessor in the array of multiprocessors is configured to execute a fused multiply-add instruction in parallel across multiple threads.

69.

发明申请
CONVOLUTIONAL NEURAL NETWORK OPTIMIZATION MECHANISM 有权

公开(公告)号：US20210397925A1

公开(公告)日：2021-12-23

申请号：US17446101

申请日：2021-08-26

申请人： Intel Corporation

发明人： Liwei Ma , Elmoustapha Ould-Ahmed-Vall , Barath Lakshmanan , Ben J. Ashbaugh , Jingyi Jin , Jeremy Bottleson , Mike B. Macpherson , Kevin Nealis , Dhawal Srivastava , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Altug Koker , Abhishek R. Appu

IPC分类号： G06N3/04 , G06N3/08 , G06N3/063 , G06T1/20

摘要： A library of machine learning primitives is provided to optimize a machine learning model to improve the efficiency of inference operations. In one embodiment a trained convolutional neural network (CNN) model is processed into a trained CNN model via pruning, convolution window optimization, and quantization.

70.

发明申请
DYNAMIC PRECISION FOR NEURAL NETWORK COMPUTE OPERATIONS 有权

公开(公告)号：US20210334637A1

公开(公告)日：2021-10-28

申请号：US17317857

申请日：2021-05-11

申请人： INTEL CORPORATION

发明人： Kamal Sinha , Balaji Vembu , Eriko Nurvitadhi , Nicolas C. Galoppo Von Borries , Rajkishore Barik , Tsung-Han Lin , Joydeep Ray , Ping T. Tang , Michael S. Strickland , Xiaoming Chen , Anbang Yao , Tatiana Shpeisman , Abhishek R. Appu , Altug Koker , Farshad Akhbari , Narayan Srinivasa , Feng Chen , Dukhwan Kim , Nadathur Rajagopalan Satish , John C. Weast , Mike B. MacPherson , Linda L. Hurd , Vasanth Ranganathan , Sanjeev S. Jahagirdar

IPC分类号： G06N3/063 , G06N3/08 , G06N3/04 , G06T1/20 , G06F9/30 , G06T15/00 , G06F15/78 , G06F15/76 , G06F1/3287 , G06F1/3293

摘要： In an example, an apparatus comprises a compute engine comprising a high precision component and a low precision component; and logic, at least partially including hardware logic, to receive instructions in the compute engine; select at least one of the high precision component or the low precision component to execute the instructions; and apply a gate to at least one of the high precision component or the low precision component to execute the instructions. Other embodiments are also disclosed and claimed.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类