Patent search ap:("INTEL CORPORATION") AND inv:"Edward T. Grochowski" Page 2

11.

发明授权
Systems, apparatuses, and methods for chained fused multiply add 有权

公开(公告)号：US12073214B2

公开(公告)日：2024-08-27

申请号：US17952001

申请日：2022-09-23

Applicant: Intel Corporation

Inventor： Jesus Corbal , Robert Valentine , Roman S. Dubtsov , Nikita A. Shustrov , Mark J. Charney , Dennis R. Bradford , Milind B. Girkar , Edward T. Grochowski , Thomas D. Fletcher , Warren E. Ferguson

IPC: G06F9/30 , G06F7/483 , G06F7/544 , G06F9/38

CPC classification number: G06F9/3001 , G06F7/483 , G06F7/5443 , G06F9/30036 , G06F9/30109 , G06F9/30112 , G06F9/3893

Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand. Execution circuitry executes the decoded single instruction to perform iterations of packed fused multiply accumulate operations by multiplying packed data elements of the sources of the first type by sub-elements of the scalar value, and adding results of these multiplications to an initial value in a first iteration and a result from a previous iteration in subsequent iterations.

12.

发明授权
Systems, apparatuses, and methods for chained fused multiply add 有权

公开(公告)号：US11487541B2

公开(公告)日：2022-11-01

申请号：US17107134

申请日：2020-11-30

Applicant: Intel Corporation

Inventor： Jesus Corbal , Robert Valentine , Roman S. Dubtsov , Nikita A. Shustrov , Mark J. Charney , Dennis R. Bradford , Milind B. Girkar , Edward T. Grochowski , Thomas D. Fletcher , Warren E. Ferguson

IPC: G06F9/30 , G06F7/544 , G06F7/483 , G06F9/38

Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand. Execution circuitry executes the decoded single instruction to perform iterations of packed fused multiply accumulate operations by multiplying packed data elements of the sources of the first type by sub-elements of the scalar value, and adding results of these multiplications to an initial value in a first iteration and a result from a previous iteration in subsequent iterations.

13.

发明授权
Data element comparison processors, methods, systems, and instructions 有权

公开(公告)号：US10423411B2

公开(公告)日：2019-09-24

申请号：US14866921

申请日：2015-09-26

Applicant: Intel Corporation

Inventor： Asit K. Mishra , Edward T. Grochowski , Jonathan D. Pearce , Deborah T. Marr , Ehud Cohen , Elmoustapha Ould-Ahmed-Vall , Jesus Corbal San Adrian , Robert Valentine , Mark J. Charney , Christopher J. Hughes , Milind B. Girkar

IPC: G06F9/30

Abstract: A processor includes a decode unit to decode an instruction that is to indicate a first source packed data operand that is to include at least four data elements, to indicate a second source packed data operand that is to include at least four data elements, and to indicate one or more destination storage locations. The execution unit, in response to the instruction, is to store at least one result mask operand in the destination storage location(s). The at least one result mask operand is to include a different mask element for each corresponding data element in one of the first and second source packed data operands in a same relative position. Each mask element is to indicate whether the corresponding data element in said one of the source packed data operands equals any of the data elements in the other of the source packed data operands.

14.

发明授权
Interruptible and restartable matrix multiplication instructions, processors, methods, and systems 有权

公开(公告)号：US10275243B2

公开(公告)日：2019-04-30

申请号：US15201442

申请日：2016-07-02

Applicant: Intel Corporation

Inventor： Edward T. Grochowski , Asit K. Mishra , Robert Valentine , Mark J. Charney , Simon C. Steely, Jr.

IPC: G06F9/30 , G06F9/38

Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.

15.

发明申请
APPARATUSES AND METHODS FOR A PROCESSOR ARCHITECTURE 审中-公开

公开(公告)号：US20190012266A1

公开(公告)日：2019-01-10

申请号：US16115067

申请日：2018-08-28

Applicant: Intel Corporation

Inventor： Jason W. Brandt , Robert S. Chappell , Jesus Corbal , Edward T. Grochowski , Stephen H. Gunther , Buford M. Guy , Thomas R. Huff , Christopher J. Hughes , Elmoustapha Ould-Ahmed-Vall , Ronak Singhal , Seyed Yahya Sotoudeh , Bret L. Toll , Lihu Rappoport , David Papworth , James D. Allen

IPC: G06F12/0831 , G06F12/1027 , G06F12/1009

Abstract: Embodiments of an invention a processor architecture are disclosed. In an embodiment, a processor includes a decoder, an execution unit, a coherent cache, and an interconnect. The decoder is to decode an instruction to zero a cache line. The execution unit is to issue a write command to initiate a cache line sized write of zeros. The coherent cache is to receive the write command, to determine whether there is a hit in the coherent cache and whether a cache coherency protocol state of the hit cache line is a modified state or an exclusive state, to configure a cache line to indicate all zeros, and to issue the write command toward the interconnect. The interconnect is to, responsive to receipt of the write command, issue a snoop to each of a plurality of other coherent caches for which it must be determined if there is a hit.

16.

发明授权
Systems, apparatuses, and methods for chained fused multiply add 有权

公开(公告)号：US10146535B2

公开(公告)日：2018-12-04

申请号：US15299420

申请日：2016-10-20

Applicant: Intel Corporation

Inventor： Jesus Corbal , Robert Valentine , Roman S. Dubtsov , Nikita A. Shustrov , Mark J. Charney , Dennis R. Bradford , Milind B. Girkar , Edward T. Grochowski , Thomas D. Fletcher , Warren E. Ferguson

IPC: G06F9/30 , G06F7/544

Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand. Execution circuitry executes the decoded single instruction to perform iterations of packed fused multiply accumulate operations by multiplying packed data elements of the sources of the first type by sub-elements of the scalar value, and adding results of these multiplications to an initial value in a first iteration and a result from a previous iteration in subsequent iterations.

17.

发明申请
MEMORY-TO-MEMORY INSTRUCTIONS TO ACCELERATE SPARSE-MATRIX BY DENSE-VECTOR AND SPARSE-VECTOR BY DENSE-VECTOR MULTIPLICATION 审中-公开

公开(公告)号：US20180173437A1

公开(公告)日：2018-06-21

申请号：US15384178

申请日：2016-12-19

Applicant: Intel Corporation

Inventor： Asit K. Mishra , Deborah T. Marr , Edward T. Grochowski

IPC: G06F3/06

CPC classification number: G06F3/0614 , G06F3/0646 , G06F3/0683 , G06F9/3001 , G06F9/30036 , G06F9/3877 , G06F2212/1016

Abstract: First elements of a dense vector to be multiplied with first elements of a first row of a sparse array may be determined. The determined first elements of the dense vector may be written into a memory. A dot product for the first elements of the sparse array and the first elements of the dense vector may be calculated in a plurality of increments by multiplying a subset of the first elements of the sparse array and a corresponding subset of the first elements of the dense vector. A sequence number may be updated after each increment is completed to identify a column number and/or a row number of the sparse array for which the dot product calculations have been completed.

18.

发明申请
APPARATUSES AND METHODS FOR A PROCESSOR ARCHITECTURE 审中-公开

公开(公告)号：US20180165199A1

公开(公告)日：2018-06-14

申请号：US15376647

申请日：2016-12-12

Applicant: Intel Corporation

Inventor： Jason W. Brandt , Robert S. Chappell , Jesus Corbal , Edward T. Grochowski , Stephen H. Gunther , Buford M. Guy , Thomas R. Huff , Christopher J. Hughes , Elmoustapha Ould-Ahmed-Vall , Ronak Singhal , Seyed Yahya Sotoudeh , Bret L. Toll , Lihu Rappoport , David Papworth , James D. Allen

IPC: G06F12/0831 , G06F12/1027 , G06F12/1009

CPC classification number: G06F12/0831 , G06F12/1009 , G06F12/1027 , G06F2212/1016 , G06F2212/621 , G06F2212/68

Abstract: Embodiments of an invention a processor architecture are disclosed. In an embodiment, a processor includes a decoder, an execution unit, a coherent cache, and an interconnect. The decoder is to decode an instruction to zero a cache line. The execution unit is to issue a write command to initiate a cache line sized write of zeros. The coherent cache is to receive the write command, to determine whether there is a hit in the coherent cache and whether a cache coherency protocol state of the hit cache line is a modified state or an exclusive state, to configure a cache line to indicate all zeros, and to issue the write command toward the interconnect. The interconnect is to, responsive to receipt of the write command, issue a snoop to each of a plurality of other coherent caches for which it must be determined if there is a hit.

19.

发明申请
Generational Thread Scheduler 审中-公开
Title translation: 生成线程调度程序

公开(公告)号：US20170031729A1

公开(公告)日：2017-02-02

申请号：US15290375

申请日：2016-10-11

Applicant: Intel Corporation

Inventor： Edward T. Grochowski , Michael D. Upton , George Z. Chrysos , Chunhui Zhang , Mohammed L. Al-Aqrabawi

IPC: G06F9/52

CPC classification number: G06F9/52 , G06F2209/5014

Abstract: Disclosed herein is a generational thread scheduler. One embodiment may be used with processor multithreading logic to execute threads of executable instructions, and a shared resource to be allocated fairly among the threads of executable instructions contending for access to the shared resource. Generational thread scheduling logic may allocate the shared resource efficiently and fairly by granting a first requesting thread access to the shared resource allocating a reservation for the shared resource to each other requesting thread of the executing threads and then blocking the first thread from re-requesting the shared resource until every other thread that has been allocated a reservation, has been granted access to the shared resource. Generation tracking state may be cleared when each requesting thread of the generation that was allocated a reservation has had their request satisfied.

Abstract translation: 这里公开的是一代代线程调度器。一个实施例可以与处理器多线程逻辑一起使用以执行可执行指令的线程，以及在竞争访问共享资源的可执行指令的线程之间公平分配的共享资源。生成线程调度逻辑可以通过向共享资源授予对共享资源的预留的第一请求线程访问来对其执行线程的请求线程，然后阻止第一线程重新请求共享资源，直到已分配了预留的每个其他线程已被授予对共享资源的访问权限。当分配了预约的生成的每个请求线程已经满足了请求时，可以清除生成跟踪状态。

20.

发明申请
MECHANISM FOR INSTRUCTION SET BASED THREAD EXECUTION ON A PLURALITY OF INSTRUCTION SEQUENCERS 审中-公开
Title translation: 用于指导性设计的机构执行多个指令序列

公开(公告)号：US20170010895A1

公开(公告)日：2017-01-12

申请号：US15276290

申请日：2016-09-26

Applicant: Intel Corporation

Inventor： Hong Wang , John P. Shen , Edward T. Grochowski , Richard A. Hankins , Gautham N. Chinya , Bryant E. Bigbee , Shivnandan D. Kaushik , Xiang Chris Zou , Per Hammarlund , Scott Dion Rodgers , Xinmin Tian , Anil Aggarwal , Prashant Sethi , Baiju V. Patel , James P. Held

IPC: G06F9/38 , G06F9/30 , G06F9/48

CPC classification number: G06F9/3867 , G06F9/30003 , G06F9/30043 , G06F9/3005 , G06F9/3009 , G06F9/30145 , G06F9/3017 , G06F9/30174 , G06F9/3851 , G06F9/4843 , G06F9/4881

Abstract: In an embodiment, a method is provided. The method includes managing user-level threads on a first instruction sequencer in response to executing user-level instructions on a second instruction sequencer that is under control of an application level program. A first user-level thread is run on the second instruction sequencer and contains one or more user level instructions. A first user level instruction has at least 1) a field that makes reference to one or more instruction sequencers or 2) implicitly references with a pointer to code that specifically addresses one or more instruction sequencers when the code is executed.

Abstract translation: 在一个实施例中，提供了一种方法。该方法包括响应于在应用级程序的控制下对第二指令定序器执行用户级指令来管理第一指令定序器上的用户级线程。在第二指令定序器上运行第一用户级线程并且包含一个或多个用户级指令。第一用户级指令至少具有1）引用一个或多个指令定序器的字段，或者2）使用指向代码执行代码时特定寻址一个或多个指令定序器的代码的指针进行隐式引用。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification