-
公开(公告)号:US10210137B2
公开(公告)日:2019-02-19
申请号:US15635716
申请日:2017-06-28
Applicant: Intel Corporation
Inventor: Ehud Cohen , Daniel David Ben-Dayan Rubin , Michael Behar , Dmitri Vainbrand
Abstract: A processor, including: decode circuitry to decode instructions; a data cache unit including circuitry to cache data for the processor; and an approximate matrix multiplication (AMM) circuit including: a data receptor circuit to receive a weight vector w and an input vector x, both of size N, and a compression regulating parameter n; a factorizer circuit to factorize w into w≅B·s, by computing a binary factorized matrix B of size N×n, and a dictionary vector s of size n; and a binary multiplier circuit to compute w^T x≅(B·s)^T x=s^T(B^T x), the binary multiplier circuit comprising a hardware accelerator circuit to compute an array product B^T x).
-
公开(公告)号:US20190004997A1
公开(公告)日:2019-01-03
申请号:US15635716
申请日:2017-06-28
Applicant: Intel Corporation
Inventor: Ehud Cohen , Daniel David Ben-Dayan Rubin , Michael Behar , Dmitri Vainbrand
Abstract: A processor, including: decode circuitry to decode instructions; a data cache unit including circuitry to cache data for the processor; and an approximate matrix multiplication (AMM) circuit including: a data receptor circuit to receive a weight vector w and an input vector x, both of size N, and a compression regulating parameter n; a factorizer circuit to factorize w into w≅B·s, by computing a binary factorized matrix B of size N×n, and a dictionary vector s of size n; and a binary multiplier circuit to compute w∧T x≅(B·s)∧T x=s∧T (B)∧T x), the binary multiplier circuit comprising a hardware accelerator circuit to compute an array product B∧T x).
-
公开(公告)号:US20170090924A1
公开(公告)日:2017-03-30
申请号:US14866921
申请日:2015-09-26
Applicant: Intel Corporation
Inventor: Asit K. Mishra , Edward T. Grochowski , Jonathan D. Pearce , Deborah T. Marr , Ehud Cohen , Elmoustapha OuId-Ahmed-Vall , Jesus Corbal San Adrian , Robert Valentine , Mark J. Charney , Christopher J. Hughes , Milind B. Girkar
IPC: G06F9/30
Abstract: A processor includes a decode unit to decode an instruction that is to indicate a first source packed data operand that is to include at least four data elements, to indicate a second source packed data operand that is to include at least four data elements, and to indicate one or more destination storage locations. The execution unit, in response to the instruction, is to store at least one result mask operand in the destination storage location(s). The at least one result mask operand is to include a different mask element for each corresponding data element in one of the first and second source packed data operands in a same relative position. Each mask element is to indicate whether the corresponding data element in said one of the source packed data operands equals any of the data elements in the other of the source packed data operands.
-
公开(公告)号:US20250095217A1
公开(公告)日:2025-03-20
申请号:US18903291
申请日:2024-10-01
Applicant: Intel Corporation
Inventor: Tomer Bar-On , Jacob Subag , Yaniv Fais , Jeremie Dreyfuss , Gal Novik , Gal Leibovich , Tomer Schwartz , Ehud Cohen , Lev Faivishevsky , Uzi Sarel , Amitai Armon , Yahav Shadmiy
IPC: G06T9/00 , G06N3/044 , G06N3/045 , G06N3/047 , G06N3/048 , G06N3/084 , G06N3/088 , H04N19/42 , H04N19/436
Abstract: In an example, an apparatus comprises logic, at least partially including hardware logic, to implement a lossy compression algorithm which utilizes a data transform and quantization process to compress data in a convolutional neural network (CNN) layer. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20230088947A1
公开(公告)日:2023-03-23
申请号:US17993591
申请日:2022-11-23
Applicant: Intel Corporation
Inventor: Theodros Yigzaw , Geeyarpuram N. Santhanakrishnan , Ganapati N. Srinivasa , Jose A. Vargas , Hisham Shafi , Michael Mishaeli , Ehud Cohen , Zeev Sperber , Shlomo Raikin , Mohan J. Kumar , Julius Y. Mandelblat
Abstract: An apparatus and method are described for detecting and correcting data fetch errors within a processor core. For example, one embodiment of an instruction processing apparatus for detecting and recovering from data fetch errors comprises: at least one processor core having a plurality of instruction processing stages including a data fetch stage and a retirement stage; and error processing logic in communication with the processing stages to perform the operations of: detecting an error associated with data in response to a data fetch operation performed by the data fetch stage; and responsively performing one or more operations to ensure that the error does not corrupt an architectural state of the processor core within the retirement stage.
-
36.
公开(公告)号:US20220237850A1
公开(公告)日:2022-07-28
申请号:US17669126
申请日:2022-02-10
Applicant: Intel Corporation
Inventor: Uzi Sarel , Ehud Cohen , Tomer Schwartz , Amitai Armon , Yahav Shadmiy , Itamar Ben-Ari , Amit Bleiweiss , Lev Faivishevsky , Tomer Bar-On , Yaniv Fais , Jacob Subag , Michael Behar , Guy Jacob , Gal Leibovich , Jeremie Dreyfuss
Abstract: In an example, an apparatus comprises a plurality of execution units; and logic, at least partially including hardware logic, to determine a sub-graph of a network that can be executed in a frequency domain and apply computations in the sub-graph in the frequency domain. Other embodiments are also disclosed and claimed.
-
公开(公告)号:US20210318932A1
公开(公告)日:2021-10-14
申请号:US17356157
申请日:2021-06-23
Applicant: Intel Corporation
Inventor: Theodros Yigzaw , Geeyarpuram N. Santhanakrishnan , Ganapati N. Srinivasa , Jose A. Vargas , Hisham Shafi , Michael Mishaeli , Ehud Cohen , Zeev Sperber , Shlomo Raikin , Mohan J. Kumar , Julius Y. Mandelblat
Abstract: An apparatus and method are described for detecting and correcting data fetch errors within a processor core. For example, one embodiment of an instruction processing apparatus for detecting and recovering from data fetch errors comprises: at least one processor core having a plurality of instruction processing stages including a data fetch stage and a retirement stage; and error processing logic in communication with the processing stages to perform the operations of: detecting an error associated with data in response to a data fetch operation performed by the data fetch stage; and responsively performing one or more operations to ensure that the error does not corrupt an architectural state of the processor core within the retirement stage.
-
公开(公告)号:US11087206B2
公开(公告)日:2021-08-10
申请号:US15581045
申请日:2017-04-28
Applicant: Intel Corporation
Inventor: Tomer Schwartz , Ehud Cohen , Uzi Sarel , Amitai Armon , Yaniv Fais , Lev Faivishevsky , Amit Bleiweiss , Yahav Shadmiy , Jacob Subag
Abstract: A mechanism is described for facilitating memory handling and data management in machine learning at autonomous machines. A method of embodiments, as described herein, includes detecting multiple tables associated with multiple neural networks at multiple autonomous machines, where each of the multiple tables include an index. The method may further include combining the multiple tables and multiple indexes associated with the multiple tables into a single table and a single index, respectively, where the single table is communicated to the multiple autonomous machines to allow simultaneous processing of one or more portions of the single table using one or more memory devices and one or more processors of one or more of the multiple autonomous machines.
-
公开(公告)号:US10817567B2
公开(公告)日:2020-10-27
申请号:US15941168
申请日:2018-03-30
Applicant: INTEL CORPORATION
Inventor: Ehud Cohen , Adnan Agbaria
IPC: G06F16/00 , G06F16/901 , G06N5/02 , G06F16/9038 , G06N20/00
Abstract: Techniques and apparatus for providing graph compression structures for graph information are described. In one embodiment, for example, an apparatus may include at least one memory, at least one processing circuitry, and logic, coupled to the at least one processing circuitry, to access graph information comprising a plurality of nodes, define a unique index for each of the plurality of nodes, determine whether each of the plurality of nodes has at least one neighbor node, and generate a graph compression structure comprising an entry for each of the plurality of nodes having at least one neighbor node and an adjacency list comprising an array of neighbor nodes of each entry.
-
公开(公告)号:US10423411B2
公开(公告)日:2019-09-24
申请号:US14866921
申请日:2015-09-26
Applicant: Intel Corporation
Inventor: Asit K. Mishra , Edward T. Grochowski , Jonathan D. Pearce , Deborah T. Marr , Ehud Cohen , Elmoustapha Ould-Ahmed-Vall , Jesus Corbal San Adrian , Robert Valentine , Mark J. Charney , Christopher J. Hughes , Milind B. Girkar
IPC: G06F9/30
Abstract: A processor includes a decode unit to decode an instruction that is to indicate a first source packed data operand that is to include at least four data elements, to indicate a second source packed data operand that is to include at least four data elements, and to indicate one or more destination storage locations. The execution unit, in response to the instruction, is to store at least one result mask operand in the destination storage location(s). The at least one result mask operand is to include a different mask element for each corresponding data element in one of the first and second source packed data operands in a same relative position. Each mask element is to indicate whether the corresponding data element in said one of the source packed data operands equals any of the data elements in the other of the source packed data operands.
-
-
-
-
-
-
-
-
-