Patent search ap:("INTEL CORPORATION") AND inv:"Dan Baum" Page 5

41.

发明授权
Systems for performing instructions for fast element unpacking into 2-dimensional registers 有权

公开(公告)号：US11507376B2

公开(公告)日：2022-11-22

申请号：US17152160

申请日：2021-01-19

Applicant: INTEL CORPORATION

Inventor： Bret Toll , Alexander F. Heinecke , Christopher J. Hughes , Ronen Zohar , Michael Espig , Dan Baum , Raanan Sade , Robert Valentine , Mark J. Charney , Elmoustapha Ould-Ahmed-Vall

IPC: G06F17/16 , G06F12/02 , G06F9/30 , G06F12/06 , G06F9/38 , G06T1/20 , G06F3/06 , G06F12/0897 , G06F12/0875 , G06F9/345

Abstract: Disclosed embodiments relate to instructions for fast element unpacking. In one example, a processor includes fetch circuitry to fetch an instruction whose format includes fields to specify an opcode and locations of an Array-of-Structures (AOS) source matrix and one or more Structure of Arrays (SOA) destination matrices, wherein: the specified opcode calls for unpacking elements of the specified AOS source matrix into the specified Structure of Arrays (SOA) destination matrices, the AOS source matrix is to contain N structures each containing K elements of different types, with same-typed elements in consecutive structures separated by a stride, the SOA destination matrices together contain K segregated groups, each containing N same-typed elements, decode circuitry to decode the fetched instruction, and execution circuitry, responsive to the decoded instruction, to unpack each element of the specified AOS matrix into one of the K element types of the one or more SOA matrices.

42.

发明授权
Apparatus and method for multicasting a cache line update using delayed refetch messages 有权

公开(公告)号：US11422809B2

公开(公告)日：2022-08-23

申请号：US15930887

申请日：2020-05-13

Applicant: Intel Corporation

Inventor： Christopher J. Hughes , Dan Baum

IPC: G06F9/30 , G06F15/80 , G06F12/0862

Abstract: An apparatus and method for processing efficient multicast operation. For example, one embodiment of a processor comprises: a plurality of cores to execute instructions; a shared circuitry region to be shared by the plurality of cores; first cache management circuitry associated with the shared circuitry region to receive delayed prefetch messages from the cores, each delayed prefetch message comprising an address or portion thereof usable to identify a cache line; and a delayed prefetch manager comprising a plurality of entries, each entry associated with at least one of the delayed prefetch messages, the delayed prefetch manager to update one or more of the entries or generate a new entry in accordance with receipt of each new delayed prefetch message, wherein upon receiving a notification that a first cache line is being modified by a first core, the delayed prefetch manager is to transmit delayed prefetch response messages to one or more cores identified in a first entry associated with the first cache line.

43.

发明授权
Systems and methods for performing duplicate detection instructions on 2D data 有权

公开(公告)号：US11294671B2

公开(公告)日：2022-04-05

申请号：US16232931

申请日：2018-12-26

Applicant: Intel Corporation

Inventor： Christopher J. Hughes , Michael Espig , Dan Baum , Robert Valentine , Bret Toll , Elmoustapha Ould-Ahmed-Vall

IPC: G06F9/30 , G06F17/16

Abstract: Disclosed embodiments relate to systems and methods for performing duplicate detection instructions on two-dimensional (2D) data. In one example, a processor includes fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction having fields to specify an opcode and locations of a source matrix comprising M×N elements and a destination, the opcode to indicate execution circuitry is to use a plurality of comparators to discover duplicates in the source matrix, and store indications of locations of discovered duplicates in the destination. The execution circuitry to execute the decoded instruction as per the opcode.

44.

发明授权
Systems for performing instructions to quickly convert and use tiles as 1D vectors 有权

公开(公告)号：US10990396B2

公开(公告)日：2021-04-27

申请号：US16145066

申请日：2018-09-27

Applicant: Intel Corporation

Inventor： Bret Toll , Christopher J. Hughes , Dan Baum , Elmoustapha Ould-Ahmed-Vall , Raanan Sade , Robert Valentine , Mark J. Charney , Alexander F. Heinecke

IPC: G06F9/30

Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.

45.

发明授权
Delayed prefetch manager to multicast an updated cache line to processor cores requesting the updated data 有权

公开(公告)号：US10664273B2

公开(公告)日：2020-05-26

申请号：US15941958

申请日：2018-03-30

Applicant: Intel Corporation

Inventor： Christopher J. Hughes , Dan Baum

IPC: G06F9/30 , G06F15/80 , G06F12/0862

Abstract: An apparatus and method for processing efficient multicast operation. For example, one embodiment of a processor comprises: a plurality of cores to execute instructions; a shared circuitry region to be shared by the plurality of cores; first cache management circuitry associated with the shared circuitry region to receive delayed prefetch messages from the cores, each delayed prefetch message comprising an address or portion thereof usable to identify a cache line; and a delayed prefetch manager comprising a plurality of entries, each entry associated with at least one of the delayed prefetch messages, the delayed prefetch manager to update one or more of the entries or generate a new entry in accordance with receipt of each new delayed prefetch message, wherein upon receiving a notification that a first cache line is being modified by a first core, the delayed prefetch manager is to transmit delayed prefetch response messages to one or more cores identified in a first entry associated with the first cache line.

46.

发明授权
Accelerator for processing data 有权

公开(公告)号：US10509846B2

公开(公告)日：2019-12-17

申请号：US15840552

申请日：2017-12-13

Applicant: Intel Corporation

Inventor： Chen Koren , Dan Baum

IPC: G06F17/16 , G06F7/544 , G06F7/523 , G06N3/04 , G06N3/063 , G06N3/08 , G06F9/30 , G06N3/02

Abstract: An accelerator for increasing the processing speed of a processor. The accelerator operates in two distinct modes. In a first mode for dense layer processing, row data sets and column data sets are sent to a multiplier for multiplication. In a second mode for sparse layer processing compressed row data sets are received by a row multiplexer and compressed column data sets are received by a column multiplexer. Each multiplexer is configured to compare the indexes of data sets with one another to determine matching indexes. When indexes match, the matching data sets are selected and sent to the multiplier for multiplication. When indexes do not match, data sets are stored in memory devices for subsequent cycles.

47.

发明申请
ACCELERATOR FOR PROCESSING DATA 审中-公开

公开(公告)号：US20190042538A1

公开(公告)日：2019-02-07

申请号：US15840552

申请日：2017-12-13

Applicant: Intel Corporation

Inventor： Chen Koren , Dan Baum

IPC: G06F17/16 , G06F7/523 , G06N3/02 , G06F9/30

Abstract: An accelerator for increasing the processing speed of a processor. The accelerator operates in two distinct modes. In a first mode for dense layer processing, row data sets and column data sets are sent to a multiplier for multiplication. In a second mode for sparse layer processing compressed row data sets are received by a row multiplexer and compressed column data sets are received by a column multiplexer. Each multiplexer is configured to compare the indexes of data sets with one another to determine matching indexes. When indexes match, the matching data sets are selected and sent to the multiplier for multiplication. When indexes do not match, data sets are stored in memory devices for subsequent cycles.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification