Patent search ap:("Intel Corporation") AND inv:"Michael Behar" Page 4

31.

发明授权
System and method of encoding and decoding feature maps and weights for a convolutional neural network 有权

公开(公告)号：US10726583B2

公开(公告)日：2020-07-28

申请号：US15395495

申请日：2016-12-30

Applicant: INTEL CORPORATION

Inventor： Ajit Singh , Bharat Daga , Oren Agam , Michael Behar , Dmitri Vainbrand

IPC: G06T9/00 , G06N3/08 , G06N3/04 , G06F13/10 , G06T15/00 , G06T1/20

Abstract: Embodiments described herein provide a processing apparatus comprising compute logic to generate output feature map data for a convolutional neural network (CNN) and write the feature map data to a memory buffer; a direct memory access (DMA) controller including a feature map encoder, the DMA controller to read the feature map data from the memory buffer, encode the feature map data using one of multiple encode algorithms, and write encoded feature map data to memory coupled with the processing apparatus; and wherein the compute logic is to read the encoded feature map data from the memory in an encoded format and decode the encoded feature map data while reading the encoded feature map data.

32.

发明授权
Multiply-accumulate “0” data gating 有权

公开(公告)号：US10606559B2

公开(公告)日：2020-03-31

申请号：US16439174

申请日：2019-06-12

Applicant: INTEL CORPORATION

Inventor： Yaniv Fais , Tomer Bar-On , Jacob Subag , Jeremie Dreyfuss , Lev Faivishevsky , Michael Behar , Amit Bleiweiss , Guy Jacob , Gal Leibovich , Itamar Ben-Ari , Galina Ryvchin , Eyal Yaacoby

IPC: G06F7/533 , G06N20/00 , G06T1/20

Abstract: In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.

33.

发明申请
COMPRESSION FOR DEEP LEARNING IN CASE OF SPARSE VALUES MAPPED TO NON-ZERO VALUE 审中-公开

公开(公告)号：US20190197420A1

公开(公告)日：2019-06-27

申请号：US15853457

申请日：2017-12-22

Applicant: Intel Corporation

Inventor： Ajit Singh , Bharat Daga , Michael Behar

IPC: G06N5/04 , G06T15/20 , G06N99/00 , G06F13/28 , G06F17/16

CPC classification number: G06N5/046 , G06F13/28 , G06F17/16 , G06N20/00 , G06T15/205

Abstract: Embodiments described herein provide a processing apparatus comprising compute logic to generate neural network data for a convolutional neural network (CNN) and write the neural network data to a memory buffer. The compute logic additionally includes a direct memory access (DMA) controller including a hardware codec having an encode unit and a decode unit, the DMA controller to read the neural network data from the memory buffer, encode the neural network data via the encode unit, write encoded neural network data to a memory device coupled with the processing apparatus, write metadata for the encoded neural network data to the memory device coupled with the processing apparatus, and decode encoded neural network data via the decode unit in response to a request from the compute logic.

34.

发明授权
Binary multiplier for binary vector factorization 有权

公开(公告)号：US10210137B2

公开(公告)日：2019-02-19

申请号：US15635716

申请日：2017-06-28

Applicant: Intel Corporation

Inventor： Ehud Cohen , Daniel David Ben-Dayan Rubin , Michael Behar , Dmitri Vainbrand

IPC: G06F17/16 , G06F17/17

Abstract: A processor, including: decode circuitry to decode instructions; a data cache unit including circuitry to cache data for the processor; and an approximate matrix multiplication (AMM) circuit including: a data receptor circuit to receive a weight vector w and an input vector x, both of size N, and a compression regulating parameter n; a factorizer circuit to factorize w into w≅B·s, by computing a binary factorized matrix B of size N×n, and a dictionary vector s of size n; and a binary multiplier circuit to compute w^T x≅(B·s)^T x=s^T(B^T x), the binary multiplier circuit comprising a hardware accelerator circuit to compute an array product B^T x).

35.

发明申请
Binary Multiplier for Binary Vector Factorization 审中-公开

公开(公告)号：US20190004997A1

公开(公告)日：2019-01-03

申请号：US15635716

申请日：2017-06-28

Applicant: Intel Corporation

Inventor： Ehud Cohen , Daniel David Ben-Dayan Rubin , Michael Behar , Dmitri Vainbrand

IPC: G06F17/16 , G06F17/17

Abstract: A processor, including: decode circuitry to decode instructions; a data cache unit including circuitry to cache data for the processor; and an approximate matrix multiplication (AMM) circuit including: a data receptor circuit to receive a weight vector w and an input vector x, both of size N, and a compression regulating parameter n; a factorizer circuit to factorize w into w≅B·s, by computing a binary factorized matrix B of size N×n, and a dictionary vector s of size n; and a binary multiplier circuit to compute w∧T x≅(B·s)∧T x=s∧T (B)∧T x), the binary multiplier circuit comprising a hardware accelerator circuit to compute an array product B∧T x).

36.

发明申请
METHODS, APPARATUS, INSTRUCTIONS AND LOGIC TO PROVIDE VECTOR PACKED HISTOGRAM FUNCTIONALITY 有权
Title translation: 方法，装置，说明和逻辑提供矢量包装组织功能

公开(公告)号：US20160378716A1

公开(公告)日：2016-12-29

申请号：US14752054

申请日：2015-06-26

Applicant: Intel Corporation

Inventor： Edward T. Grochowski , Galina Ryvchin , Michael Behar

IPC: G06F15/80 , G06F9/30

CPC classification number: G06F15/8076 , G06F9/3001 , G06F9/30021 , G06F9/30036 , G06F9/30101 , G06F9/30145 , G06F15/8007

Abstract: Instructions and logic provide SIMD vector packed histogram functionality. Some processor embodiments include first and second registers storing, in each of a plurality of data fields of a register lane portion, corresponding elements of a first and of a second data type, respectively. A decode stage decodes an instruction for SIMD vector packed histograms. One or more execution units, compare each element of the first data type, in the first register lane portion, with a range specified by the instruction. For any elements of the first register portion in said range, corresponding elements of the second data type, from the second register portion, are added into one of a plurality data fields of a destination register lane portion, selected according to the value of its corresponding element of the first data type, to generate packed weighted histograms for each destination register lane portion.

Abstract translation: 指令和逻辑提供SIMD矢量压缩直方图功能。一些处理器实施例包括分别在寄存器通道部分的多个数据字段的每一个中分别存储第一和第二数据类型的对应元件的第一和第二寄存器。解码级对SIMD矢量压缩直方图的指令进行解码。一个或多个执行单元将第一注册通道部分中的第一数据类型的每个元素与指令指定的范围进行比较。对于所述范围中的第一寄存器部分的任何元件，来自第二寄存器部分的第二数据类型的对应元件被添加到目的地寄存器通道部分的多个数据字段中的一个，根据其相应的值元素，以产生每个目的地寄存器通道部分的压缩的直方图。

37.

发明公开
COMPRESSION FOR DEEP LEARNING IN CASE OF SPARSE VALUES MAPPED TO NON-ZERO VALUE 审中-公开

公开(公告)号：US20240078453A1

公开(公告)日：2024-03-07

申请号：US18466981

申请日：2023-09-14

Applicant: Intel Corporation

Inventor： Ajit Singh , Bharat Daga , Michael Behar

IPC: G06N5/046 , G06F13/10 , G06F13/28 , G06F17/16 , G06N3/04 , G06N3/08 , G06N20/00 , G06T9/00 , G06T15/20

CPC classification number: G06N5/046 , G06F13/10 , G06F13/28 , G06F17/16 , G06N3/04 , G06N3/08 , G06N20/00 , G06T9/002 , G06T15/205

Abstract: Embodiments described herein provide a processing apparatus comprising compute circuitry to generate neural network data for a convolutional neural network (CNN) and write the neural network data to a memory buffer. The compute circuitry additionally includes a direct memory access (DMA) controller including a hardware codec having encode circuitry and a decode circuitry. The DMA controller reads the neural network data from the memory buffer, encode the neural network data via the encode circuit, writes encoded neural network data to a memory device coupled with the processing apparatus, writes metadata for the encoded neural network data to the memory device coupled with the processing apparatus, and decodes encoded neural network data via the decode circuit in response to a request from the compute circuitry.

38.

发明授权
Methods and apparatus to enable out-of-order pipelined execution of static mapping of a workload 有权

公开(公告)号：US11847497B2

公开(公告)日：2023-12-19

申请号：US17561500

申请日：2021-12-23

Applicant: Intel Corporation

Inventor： Michael Behar , Moshe Maor , Ronen Gabbai , Roni Rosner , Zigi Walter , Oren Agam

IPC: G06F9/46 , G06F9/50 , G06F3/06

CPC classification number: G06F9/5016 , G06F3/0613 , G06F3/0659 , G06F3/0673 , G06F9/505

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed that enable out-of-order pipelined execution of static mapping of a workload to one or more computational building blocks of an accelerator. An example apparatus includes an interface to load a first number of credits into memory; a comparator to compare the first number of credits to a threshold number of credits associated with memory availability in a buffer; and a dispatcher to, when the first number of credits meets the threshold number of credits, select a workload node of the workload to be executed at a first one of the one or more computational building blocks.

39.

发明授权
Real time context dependent deep learning 有权

公开(公告)号：US11704564B2

公开(公告)日：2023-07-18

申请号：US17404153

申请日：2021-08-17

Applicant: Intel Corporation

Inventor： Lev Faivishevsky , Tomer Bar-On , Yaniv Fais , Jacob Subag , Jeremie Dreyfuss , Amit Bleiweiss , Tomer Schwartz , Raanan Yonatan Yehezkel Rohekar , Michael Behar , Amitai Armon , Uzi Sarel

IPC: G06N3/08 , G06N20/00 , G06N20/10

CPC classification number: G06N3/08 , G06N20/00 , G06N20/10

Abstract: In an example, an apparatus comprises a plurality of execution units comprising and logic, at least partially including hardware logic, to receive a plurality of data inputs for training a neural network, wherein the data inputs comprise training data and weights inputs; represent the data inputs in a first form; and represent the weight inputs in a second form. Other embodiments are also disclosed and claimed.

40.

发明申请
METHODS AND APPARATUS TO ENABLE OUT-OF-ORDER PIPELINED EXECUTION OF STATIC MAPPING OF A WORKLOAD 有权

公开(公告)号：US20220197703A1

公开(公告)日：2022-06-23

申请号：US17561500

申请日：2021-12-23

Applicant: Intel Corporation

Inventor： Michael Behar , Moshe Maor , Ronen Gabbai , Roni Rosner , Zigi Walter , Oren Agam

IPC: G06F9/50 , G06F3/06

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed that enable out-of-order pipelined execution of static mapping of a workload to one or more computational building blocks of an accelerator. An example apparatus includes an interface to load a first number of credits into memory; a comparator to compare the first number of credits to a threshold number of credits associated with memory availability in a buffer; and a dispatcher to, when the first number of credits meets the threshold number of credits, select a workload node of the workload to be executed at a first one of the one or more computational building blocks.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification