Patent search ap:("INTEL CORPORATION") AND inv:"Edward T. Grochowski" Page 1

1.

发明授权
Interruptible and restartable matrix multiplication instructions, processors, methods, and systems 有权

公开(公告)号：US12050912B2

公开(公告)日：2024-07-30

申请号：US18220225

申请日：2023-07-10

Applicant: Intel Corporation

Inventor： Edward T. Grochowski , Asit K. Mishra , Robert Valentine , Mark J. Charney , Simon C. Steely, Jr.

IPC: G06F9/30 , G06F9/38

CPC classification number: G06F9/3001 , G06F9/30036 , G06F9/30145 , G06F9/3861 , G06F9/3865

Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.

2.

发明授权
Interruptible and restartable matrix multiplication instructions, processors, methods, and systems 有权

公开(公告)号：US11048508B2

公开(公告)日：2021-06-29

申请号：US16398200

申请日：2019-04-29

Applicant: Intel Corporation

Inventor： Edward T. Grochowski , Asit K. Mishra , Robert Valentine , Mark J. Charney , Simon C. Steely, Jr.

IPC: G06F9/30 , G06F9/38

Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.

3.

发明授权
Heterogeneous multiprocessor including scalar and SIMD processors in a ratio defined by execution time and consumed die area 有权

公开(公告)号：US10891255B2

公开(公告)日：2021-01-12

申请号：US14662089

申请日：2015-03-18

Applicant: Intel Corporation

Inventor： Edward T. Grochowski , Michael E. Kounavis , Ron Shalev

IPC: G06F9/38 , G06F1/3293 , G06F1/3287 , G06T1/20 , G06F9/50 , G06T7/00 , G06F15/78

Abstract: In one embodiment, a heterogeneous multicore processor is described that is optimized to execute multi-stage computer vision algorithms such as cascade classifier workloads. In such embodiment the heterogeneous processor includes at least one SIMD core, such as a vector processor core, coupled with one or more scalar cores. In one embodiment the heterogeneous multiprocessor executes multi-stage compute operations, where the SIMD core computes a first set of stages and the one or more scalar cores compute the second set of stages. In one embodiment, a process for designing a heterogeneous multicore processor is disclosed which optimizes the ratio of scalar to SIMD cores based on execution time of the multi-stage compute operation in relation to processor die area consumed by a processor configuration having the ratio.

4.

发明授权
Mechanism for instruction set based thread execution on a plurality of instruction sequencers 有权

公开(公告)号：US10452403B2

公开(公告)日：2019-10-22

申请号：US14866875

申请日：2015-09-26

Applicant: Intel Corporation

Inventor： Hong Wang , John P. Shen , Edward T. Grochowski , Richard A. Hankins , Gautham N. Chinya , Bryant E. Bigbee , Shivnandan D. Kaushik , Xiang Chris Zou , Per Hammarlund , Scott Dion Rodgers , Xinmin Tian , Anil Aggawal , Prashant Sethi , Baiju V. Patel , James P Held

IPC: G06F9/455 , G06F9/38 , G06F9/30 , G06F9/48

Abstract: In an embodiment, a method is provided. The method includes managing user-level threads on a first instruction sequencer in response to executing user-level instructions on a second instruction sequencer that is under control of an application level program. A first user-level thread is run on the second instruction sequencer and contains one or more user level instructions. A first user level instruction has at least 1) a field that makes reference to one or more instruction sequencers or 2) implicitly references with a pointer to code that specifically addresses one or more instruction sequencers when the code is executed.

5.

发明授权
Zeroing a cache line 有权

公开(公告)号：US10282296B2

公开(公告)日：2019-05-07

申请号：US15376647

申请日：2016-12-12

Applicant: Intel Corporation

Inventor： Jason W. Brandt , Robert S. Chappell , Jesus Corbal , Edward T. Grochowski , Stephen H. Gunther , Buford M. Guy , Thomas R. Huff , Elmoustapha Ould-Ahmed-Vall , Bret L. Toll , David Papworth , James D. Allen

IPC: G06F12/0831 , G06F12/1009 , G06F12/1027

Abstract: Embodiments of an invention a processor architecture are disclosed. In an embodiment, a processor includes a decoder, an execution unit, a coherent cache, and an interconnect. The decoder is to decode an instruction to zero a cache line. The execution unit is to issue a write command to initiate a cache line sized write of zeros. The coherent cache is to receive the write command, to determine whether there is a hit in the coherent cache and whether a cache coherency protocol state of the hit cache line is a modified state or an exclusive state, to configure a cache line to indicate all zeros, and to issue the write command toward the interconnect. The interconnect is to, responsive to receipt of the write command, issue a snoop to each of a plurality of other coherent caches for which it must be determined if there is a hit.

6.

发明授权
Packed finite impulse response (FIR) filter processors, methods, systems, and instructions 有权

公开(公告)号：US09898286B2

公开(公告)日：2018-02-20

申请号：US14704633

申请日：2015-05-05

Applicant: INTEL CORPORATION

Inventor： Edwin Jan Van Dalen , Martinus C. Wezelenburg , Steven Roos , Edward T. Grochowski , Moshe Maor

IPC: G06F7/38 , G06F9/30 , G06F9/38 , G06F9/455 , H03H17/02 , H03H17/06

CPC classification number: G06F9/30036 , G06F9/3001 , G06F9/3893 , G06F9/455 , H03H17/0202 , H03H17/06 , H03H2017/0298

Abstract: A processor includes a decode unit to decode a packed finite impulse response (FIR) filter instruction that indicates one or more source packed data operands, a plurality of FIR filter coefficients, and a destination storage location. The source operand(s) include a first number of data elements and a second number of additional data elements. The second number is one less than a number of FIR filter taps. An execution unit, in response to the packed FIR filter instruction being decoded, is to store a result packed data operand. The result packed data operand includes the first number of FIR filtered data elements that each is to be based on a combination of products of the plurality of FIR filter coefficients and a different corresponding set of data elements from the one or more source packed data operands, which is equal in number to the number of FIR filter taps.

7.

发明申请
DATA ELEMENT COMPARISON PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS 有权

公开(公告)号：US20170090924A1

公开(公告)日：2017-03-30

申请号：US14866921

申请日：2015-09-26

Applicant: Intel Corporation

Inventor： Asit K. Mishra , Edward T. Grochowski , Jonathan D. Pearce , Deborah T. Marr , Ehud Cohen , Elmoustapha OuId-Ahmed-Vall , Jesus Corbal San Adrian , Robert Valentine , Mark J. Charney , Christopher J. Hughes , Milind B. Girkar

IPC: G06F9/30

Abstract: A processor includes a decode unit to decode an instruction that is to indicate a first source packed data operand that is to include at least four data elements, to indicate a second source packed data operand that is to include at least four data elements, and to indicate one or more destination storage locations. The execution unit, in response to the instruction, is to store at least one result mask operand in the destination storage location(s). The at least one result mask operand is to include a different mask element for each corresponding data element in one of the first and second source packed data operands in a same relative position. Each mask element is to indicate whether the corresponding data element in said one of the source packed data operands equals any of the data elements in the other of the source packed data operands.

8.

发明申请
METHODS, APPARATUS, INSTRUCTIONS AND LOGIC TO PROVIDE VECTOR PACKED HISTOGRAM FUNCTIONALITY 有权
Title translation: 方法，装置，说明和逻辑提供矢量包装组织功能

公开(公告)号：US20160378716A1

公开(公告)日：2016-12-29

申请号：US14752054

申请日：2015-06-26

Applicant: Intel Corporation

Inventor： Edward T. Grochowski , Galina Ryvchin , Michael Behar

IPC: G06F15/80 , G06F9/30

CPC classification number: G06F15/8076 , G06F9/3001 , G06F9/30021 , G06F9/30036 , G06F9/30101 , G06F9/30145 , G06F15/8007

Abstract: Instructions and logic provide SIMD vector packed histogram functionality. Some processor embodiments include first and second registers storing, in each of a plurality of data fields of a register lane portion, corresponding elements of a first and of a second data type, respectively. A decode stage decodes an instruction for SIMD vector packed histograms. One or more execution units, compare each element of the first data type, in the first register lane portion, with a range specified by the instruction. For any elements of the first register portion in said range, corresponding elements of the second data type, from the second register portion, are added into one of a plurality data fields of a destination register lane portion, selected according to the value of its corresponding element of the first data type, to generate packed weighted histograms for each destination register lane portion.

Abstract translation: 指令和逻辑提供SIMD矢量压缩直方图功能。一些处理器实施例包括分别在寄存器通道部分的多个数据字段的每一个中分别存储第一和第二数据类型的对应元件的第一和第二寄存器。解码级对SIMD矢量压缩直方图的指令进行解码。一个或多个执行单元将第一注册通道部分中的第一数据类型的每个元素与指令指定的范围进行比较。对于所述范围中的第一寄存器部分的任何元件，来自第二寄存器部分的第二数据类型的对应元件被添加到目的地寄存器通道部分的多个数据字段中的一个，根据其相应的值元素，以产生每个目的地寄存器通道部分的压缩的直方图。

9.

发明授权
Providing extended cache replacement state information 有权
Title translation: 提供扩展缓存替换状态信息

公开(公告)号：US09170955B2

公开(公告)日：2015-10-27

申请号：US13685991

申请日：2012-11-27

Applicant: Intel Corporation

Inventor： Andrew T. Forsyth , Ramacharan Sundararaman , Eric Sprangle , John C. Mejia , Douglas M. Carmean , Edward T. Grochowski , Robert D. Cavin

IPC: G06F12/08 , G06F12/12

CPC classification number: G06F12/126 , G06F12/123 , Y02D10/13

Abstract: In an embodiment, a processor includes a decode logic to receive and decode a first memory access instruction to store data in a cache memory with a replacement state indicator of a first level, and to send the decoded first memory access instruction to a control logic. In turn, the control logic is to store the data in a first way of a first set of the cache memory and to store the replacement state indicator of the first level in a metadata field of the first way responsive to the decoded first memory access instruction. Other embodiments are described and claimed.

Abstract translation: 在一个实施例中，处理器包括解码逻辑，用于接收和解码第一存储器访问指令以将数据存储在具有第一级的替换状态指示符的高速缓冲存储器中，并将解码的第一存储器访问指令发送到控制逻辑。反过来，控制逻辑是以第一组高速缓冲存储器的第一种方式存储数据，并且响应于解码的第一存储器访问指令将第一级的替换状态指示符存储在第一方式的元数据字段中。描述和要求保护其他实施例。

10.

发明授权
Apparatuses and methods for a processor architecture 有权

公开(公告)号：US12130740B2

公开(公告)日：2024-10-29

申请号：US17712632

申请日：2022-04-04

Applicant: Intel Corporation

Inventor： Jason W. Brandt , Robert S. Chappell , Jesus Corbal , Edward T. Grochowski , Stephen H. Gunther , Buford M. Guy , Thomas R. Huff , Christopher J. Hughes , Elmoustapha Ould-Ahmed-Vall , Ronak Singhal , Seyed Yahya Sotoudeh , Bret L. Toll , Lihu Rappoport , David B. Papworth , James D. Allen

IPC: G06F12/0831 , G06F9/30 , G06F9/38 , G06F12/1009 , G06F12/1027

CPC classification number: G06F12/0831 , G06F9/30043 , G06F9/384 , G06F12/1009 , G06F12/1027 , G06F2212/1016 , G06F2212/621 , G06F2212/68

Abstract: Embodiments of an invention a processor architecture are disclosed. In an embodiment, a processor includes a decoder, an execution unit, a coherent cache, and an interconnect. The decoder is to decode an instruction to zero a cache line. The execution unit is to issue a write command to initiate a cache line sized write of zeros. The coherent cache is to receive the write command, to determine whether there is a hit in the coherent cache and whether a cache coherency protocol state of the hit cache line is a modified state or an exclusive state, to configure a cache line to indicate all zeros, and to issue the write command toward the interconnect. The interconnect is to, responsive to receipt of the write command, issue a snoop to each of a plurality of other coherent caches for which it must be determined if there is a hit.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification