-
公开(公告)号:US10853035B2
公开(公告)日:2020-12-01
申请号:US16833128
申请日:2020-03-27
Applicant: Intel Corporation
Inventor: Yaniv Fais , Tomer Bar-On , Jacob Subag , Jeremie Dreyfuss , Lev Faivishevsky , Michael Behar , Amit Bleiweiss , Guy Jacob , Gal Leibovich , Itamar Ben-Ari , Galina Ryvchin , Eyal Yaacoby
Abstract: In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US10372416B2
公开(公告)日:2019-08-06
申请号:US15499893
申请日:2017-04-28
Applicant: Intel Corporation
Inventor: Yaniv Fais , Tomer Bar-On , Jacob Subag , Jeremie Dreyfuss , Lev Faivishevsky , Michael Behar , Amit Bleiweiss , Guy Jacob , Gal Leibovich , Itamar Ben-Ari , Galina Ryvchin , Eyal Yaacoby
Abstract: In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20190102671A1
公开(公告)日:2019-04-04
申请号:US15720982
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: Ehud Cohen , Moshe Maor , Ashutosh Parkhi , Michael Behar , Yaniv Fais
Abstract: A convolutional neural network (CNN) accelerator, including: a CNN circuit for performing a multiple-layer CNN computation, wherein the multiple layers are to receive an input feature according to an input feature map (IFM) and a weight matrix per output feature, wherein an output of a first layer provides an input for a next layer; and a mapping circuit to access a three-dimensional input matrix stored as a Z-major matrix; wherein the CNN circuit is to perform an inner-product direct convolution on the Z-major matrix, wherein the direct convolution lacks a lowering operation.
-
24.
公开(公告)号:US09875213B2
公开(公告)日:2018-01-23
申请号:US14752054
申请日:2015-06-26
Applicant: Intel Corporation
Inventor: Edward T. Grochowski , Galina Ryvchin , Michael Behar
CPC classification number: G06F15/8076 , G06F9/3001 , G06F9/30021 , G06F9/30036 , G06F9/30101 , G06F9/30145 , G06F15/8007
Abstract: Instructions and logic provide SIMD vector packed histogram functionality. Some processor embodiments include first and second registers storing, in each of a plurality of data fields of a register lane portion, corresponding elements of a first and of a second data type, respectively. A decode stage decodes an instruction for SIMD vector packed histograms. One or more execution units, compare each element of the first data type, in the first register lane portion, with a range specified by the instruction. For any elements of the first register portion in said range, corresponding elements of the second data type, from the second register portion, are added into one of a plurality data fields of a destination register lane portion, selected according to the value of its corresponding element of the first data type, to generate packed weighted histograms for each destination register lane portion.
-
25.
公开(公告)号:US09189398B2
公开(公告)日:2015-11-17
申请号:US13730030
申请日:2012-12-28
Applicant: Intel Corporation
Inventor: Ilan Pardo , Michael Behar , Oren Ben-Kiki , Dror Markovich
CPC classification number: G06F12/0802 , G06F12/0875 , G06F12/0897 , Y02D10/13
Abstract: A processor is described comprising: an architectural register file implemented as a combination of a register file cache and an architectural register region within a level 1 (L1) data cache, and a data location table (DLT) to store data indicating a location of each architectural register within the register file cache and/or the architectural register region within the L1 data cache.
Abstract translation: 描述了一种处理器,包括:实现为级别1(L1)数据高速缓存中的寄存器文件高速缓存和结构寄存器区域的组合的架构寄存器文件,以及数据位置表(DLT),用于存储指示每个 寄存器文件缓存内的架构寄存器和/或L1数据高速缓存内的体系结构寄存器区域。
-
公开(公告)号:US11656846B2
公开(公告)日:2023-05-23
申请号:US17103179
申请日:2020-11-24
Applicant: Intel Corporation
Inventor: Yaniv Fais , Tomer Bar-On , Jacob Subag , Jeremie Dreyfuss , Lev Faivishevsky , Michael Behar , Amit Bleiweiss , Guy Jacob , Gal Leibovich , Itamar Ben-Ari , Galina Ryvchin , Eyal Yaacoby
CPC classification number: G06F7/5332 , G06N20/00 , G06T1/20
Abstract: In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US11422939B2
公开(公告)日:2022-08-23
申请号:US16727657
申请日:2019-12-26
Applicant: Intel Corporation
Inventor: Israel Diamand , Ravi K. Venkatesan , Shlomi Shua , Oz Shitrit , Michael Behar , Roni Rosner
IPC: G06F12/00 , G06F12/084 , G06F12/126
Abstract: Disclosed embodiments relate to a shared read request (SRR) using a common request tracker (CRT) as a temporary cache. In one example, a multi-core system includes a memory and a memory controller to receive a SRR from a core when a Leader core is not yet identified, allocate a CRT entry and store the SRR therein, mark it as a Leader, send a read request to a memory address indicated by the SRR, and when read data returns from the memory, store the read data in the CRT entry, send the read data to the Leader core, and await receipt, unless already received, of another SRR from a Follower core, the other SRR having a same address as the SRR, then, send the read data to the Follower core, and deallocate the CRT entry.
-
28.
公开(公告)号:US20220066923A1
公开(公告)日:2022-03-03
申请号:US17523384
申请日:2021-11-10
Applicant: Intel Corporation
Inventor: Zigi Walter , Roni Rosner , Michael Behar
Abstract: Systems, apparatuses and methods may provide for technology that determines runtime memory requirements of an artificial intelligence (AI) application, defines a remote address range for a plurality of memories based on the runtime memory requirements, wherein each memory in the plurality of memories corresponds to a processor in a plurality of processors, and defines a shared address range for the plurality of memories based on the runtime memory requirements, wherein the shared address range is aliased. In one example, the technology configures memory mapping hardware to access the remote address range in a linear sequence and access the shared address range in a hashed sequence.
-
公开(公告)号:US11151074B2
公开(公告)日:2021-10-19
申请号:US16542085
申请日:2019-08-15
Applicant: Intel Corporation
Inventor: Israel Diamand , Roni Rosner , Ravi Venkatesan , Shlomi Shua , Oz Shitrit , Henrietta Bezbroz , Alexander Gendler , Ohad Falik , Zigi Walter , Michael Behar , Shlomi Alkalay
IPC: G06F13/42 , G06N3/04 , G06F13/20 , G06F12/0893
Abstract: Methods and apparatus to implement multiple inference compute engines are disclosed herein. A disclosed example apparatus includes a first inference compute engine, a second inference compute engine, and an accelerator on coherent fabric to couple the first inference compute engine and the second inference compute engine to a converged coherency fabric of a system-on-chip, the accelerator on coherent fabric to arbitrate requests from the first inference compute engine and the second inference compute engine to utilize a single in-die interconnect port.
-
">
公开(公告)号:US20210141604A1
公开(公告)日:2021-05-13
申请号:US17103179
申请日:2020-11-24
Applicant: Intel Corporation
Inventor: Yaniv Fais , Tomer Bar-On , Jacob Subag , Jeremie Dreyfuss , Lev Faivishevsky , Michael Behar , Amit Bleiweiss , Guy Jacob , Gal Leibovich , Itamar Ben-Ari , Galina Ryvchin , Eyal Yaacoby
Abstract: In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.
-
-
-
-
-
-
-
-
-