-
公开(公告)号:US11494608B2
公开(公告)日:2022-11-08
申请号:US16540581
申请日:2019-08-14
Applicant: Intel Corporation
Inventor: Yaniv Fais , Moshe Maor
Abstract: An example apparatus to perform a convolution on an input tensor includes a parameters generator to: generate a horizontal hardware execution parameter for a horizontal dimension of the input tensor based on a kernel parameter and a layer parameter; and generate a vertical hardware execution parameter for a vertical dimension of the input tensor based on the kernel parameter and the layer parameter; an accelerator interface to configure a hardware accelerator circuitry based on the horizontal and vertical hardware execution parameters; a horizontal Iterator controller to determine when the hardware accelerator circuitry completes the first horizontal iteration of the convolution; and a vertical Iterator controller to determine when the hardware accelerator circuitry completes the first vertical iteration of the convolution.
-
12.
公开(公告)号:US20220197703A1
公开(公告)日:2022-06-23
申请号:US17561500
申请日:2021-12-23
Applicant: Intel Corporation
Inventor: Michael Behar , Moshe Maor , Ronen Gabbai , Roni Rosner , Zigi Walter , Oren Agam
Abstract: Methods, apparatus, systems and articles of manufacture are disclosed that enable out-of-order pipelined execution of static mapping of a workload to one or more computational building blocks of an accelerator. An example apparatus includes an interface to load a first number of credits into memory; a comparator to compare the first number of credits to a threshold number of credits associated with memory availability in a buffer; and a dispatcher to, when the first number of credits meets the threshold number of credits, select a workload node of the workload to be executed at a first one of the one or more computational building blocks.
-
13.
公开(公告)号:US11231963B2
公开(公告)日:2022-01-25
申请号:US16542012
申请日:2019-08-15
Applicant: Intel Corporation
Inventor: Michael Behar , Moshe Maor , Ronen Gabbai , Roni Rosner , Zigi Walter , Oren Agam
Abstract: Methods, apparatus, systems and articles of manufacture are disclosed that enable out-of-order pipelined execution of static mapping of a workload to one or more computational building blocks of an accelerator. An example apparatus includes an interface to load a first number of credits into memory; a comparator to compare the first number of credits to a threshold number of credits associated with memory availability in a buffer; and a dispatcher to, when the first number of credits meets the threshold number of credits, select a workload node of the workload to be executed at a first one of the one or more computational building blocks.
-
公开(公告)号:US11093226B2
公开(公告)日:2021-08-17
申请号:US16541131
申请日:2019-08-14
Applicant: Intel Corporation
Inventor: Moshe Maor
Abstract: Apparatus, systems, and methods for a generic firmware-based kernel library mechanism are disclosed. An example apparatus includes a compiler to compile kernels into an executable and linkable format, an image generator to generate library images from executable and linkable format locations, a reducer to retrieve a library image, the library image retrieved starting from a first section of an existing library, the retrieved library image to be used as a platform for developing a new kernel library, a selector to select kernels to include in the new kernel library, one or more libraries organized into a defined number of kernel banks, the kernels combined based on intended application development, and a linker to link a library start function pointer to the library start function, the library start function positioned within the library image, the pointer incorporated in a first section of the library image.
-
15.
公开(公告)号:US10990399B2
公开(公告)日:2021-04-27
申请号:US16539005
申请日:2019-08-13
Applicant: Intel Corporation
Inventor: Moshe Maor , Yaniv Fais
Abstract: Methods and apparatus to implement efficient communications between components of computing systems are disclosed. An example apparatus includes a message generator to: add a first value associated with a first field of a message to a shift register based on a first push operation, the message including multiple fields, at least two of the fields having different bit widths; and add a second value associated with a second field of the message to the shift register based on a second push operation, the second value to be adjacent the first value in the shift register in accordance with a structure of the message. The example apparatus further includes a communications interface to transmit content stored in the shift register to a hardware device via a bus having a width corresponding to a width of the shift register, the content including the message.
-
公开(公告)号:US20250086445A1
公开(公告)日:2025-03-13
申请号:US18888744
申请日:2024-09-18
Applicant: Intel Corporation
Inventor: Ehud Cohen , Moshe Maor , Ashutosh Parkhi , Michael Behar , Yaniv Fais
Abstract: A convolutional neural network (CNN) accelerator, including: a CNN circuit for performing a multiple-layer CNN computation, wherein the multiple layers are to receive an input feature according to an input feature map (IFM) and a weight matrix per output feature, wherein an output of a first layer provides an input for a next layer; and a mapping circuit to access a three-dimensional input matrix stored as a Z-major matrix; wherein the CNN circuit is to perform an inner-product direct convolution on the Z-major matrix, wherein the direct convolution lacks a lowering operation.
-
公开(公告)号:US12131250B2
公开(公告)日:2024-10-29
申请号:US15720982
申请日:2017-09-29
Applicant: Intel Corporation
Inventor: Ehud Cohen , Moshe Maor , Ashutosh Parkhi , Michael Behar , Yaniv Fais
CPC classification number: G06N3/063 , G06F16/17 , G06F18/21 , G06N3/045 , G06N3/08 , G06V10/454 , G06V10/82 , G06V10/955
Abstract: A convolutional neural network (CNN) accelerator, including: a CNN circuit for performing a multiple-layer CNN computation, wherein the multiple layers are to receive an input feature according to an input feature map (IFM) and a weight matrix per output feature, wherein an output of a first layer provides an input for a next layer; and a mapping circuit to access a three-dimensional input matrix stored as a Z-major matrix; wherein the CNN circuit is to perform an inner-product direct convolution on the Z-major matrix, wherein the direct convolution lacks a lowering operation.
-
公开(公告)号:US20230333913A1
公开(公告)日:2023-10-19
申请号:US18309650
申请日:2023-04-28
Applicant: INTEL CORPORATION
Inventor: Michael Behar , Moshe Maor , Ronen Gabbai , Roni Rosner , Zigi Walter , Oren Agam
IPC: G06F9/50 , G06F16/901 , G06N3/044 , G06N3/045
CPC classification number: G06F9/5083 , G06F16/9024 , G06N3/044 , G06N3/045
Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to configure heterogenous components in an accelerator. An example apparatus includes a graph compiler to identify a workload node in a workload and generate a selector for the workload node, and the selector to identify an input condition and an output condition of a compute building block, wherein the graph compiler is to, in response to obtaining the identified input condition and output condition from the selector, map the workload node to the compute building block.
-
公开(公告)号:US20230067421A1
公开(公告)日:2023-03-02
申请号:US17954846
申请日:2022-09-28
Applicant: Intel Corporation
Inventor: Yaniv Fais , Moshe Maor
Abstract: An example apparatus to perform a convolution on an input tensor includes a parameters generator to: generate a horizontal hardware execution parameter for a horizontal dimension of the input tensor based on a kernel parameter and a layer parameter; and generate a vertical hardware execution parameter for a vertical dimension of the input tensor based on the kernel parameter and the layer parameter; an accelerator interface to configure a hardware accelerator circuitry based on the horizontal and vertical hardware execution parameters; a horizontal Iterator controller to determine when the hardware accelerator circuitry completes the first horizontal iteration of the convolution; and a vertical Iterator controller to determine when the hardware accelerator circuitry completes the first vertical iteration of the convolution.
-
公开(公告)号:US10572404B2
公开(公告)日:2020-02-25
申请号:US15638429
申请日:2017-06-30
Applicant: Intel Corporation
Inventor: Moshe Maor
Abstract: A processor device is provided with hardware-implemented logic to receive an instruction including a pointer identifier and a pointer change value, the pointer identifier including a pointer address field encoded with an address of a line of memory corresponding to a location of a pointer of a particular one of the one or more cyclic buffers, one or more cushion bits, and a buffer identifier field encoded with a buffer identifier assigned to the particular cyclic buffer. The logic further enables the processor to identify that the instruction is to apply to the particular cyclic buffer based on the buffer identifier, determine that the pointer change value causes a wraparound of the pointer in the particular cyclic buffer, and fix location of the pointer in the particular cyclic buffer based on the wraparound.
-
-
-
-
-
-
-
-
-