Patent search ap:("Intel Corporation") AND inv:"Jorge E. Parra" Page 1

1.

发明授权
Method and apparatus for vector-matrix comparison 有权

公开(公告)号：US10782971B1

公开(公告)日：2020-09-22

申请号：US16370922

申请日：2019-03-30

Applicant: Intel Corporation

Inventor： Christopher J. Hughes , ElMoustapha Ould-Ahmed-Vall , Jorge E. Parra , Prasoonkumar Surti , Krishna N. Vinod , Ronen Zohar

IPC: G06F9/30 , G06F9/38 , G06F9/345

Abstract: Methods and apparatus for vector-matrix comparison are disclosed. In one embodiment, a processor comprises decoding and execution circuitry. The decoding circuitry decodes an instruction, where operands of the instruction specifies an output location to store output results, a vector of data element values, and a matrix of data element values. The execution circuitry executes the decoded instruction. The execution includes to map each of the data element values of the vector to one of consecutive rows of the matrix; for each data element value of the vector, to compare that data element value of the vector with data element values in a respective row of the matrix and obtain data element match results. The execution further includes to store the output results based on the data element match results, where each output result maps to a respective data element column position and indicates a vector match result.

2.

发明申请
Fusion of SIMD Processing Units 审中-公开

公开(公告)号：US20190265973A1

公开(公告)日：2019-08-29

申请号：US15903283

申请日：2018-02-23

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Supratim Pal , Ashutosh Garg , Darin M. Starkey , Guei-Yuan Lueh , Jorge E. Parra , Shubh B. Shah , Wei-Yu Chen , Vikranth Vemulapalli , Narsim Krishna , Brent A. Schwartz , Chandra S. Gurram , Wei Pan , Ashwin J. Shivani

IPC: G06F9/30 , G06F9/38 , G06T1/20

Abstract: Methods and apparatus relating to techniques for fusing SIMD processing units. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to receive an instruction set for execution on at least two graphics processing execution units, determine whether the instruction set requires data dependent addressing, and select between a synchronized execution environment for the at least two graphics processing units and an unsynchronized execution environment for the at least two graphics processing units based at least in part on the determination whether the instruction set requires data dependent addressing. Other embodiments are also disclosed and claimed.

3.

发明授权
Dynamic heterogeneous hashing functions in ranges of system memory addressing space 有权

公开(公告)号：US09680652B2

公开(公告)日：2017-06-13

申请号：US15216317

申请日：2016-07-21

Applicant: INTEL CORPORATION

Inventor： Jorge E. Parra , Joydeep Ray , Ramadass Nagarajan

IPC: H04L9/32 , G06F13/16 , G06F12/06

CPC classification number: H04L9/3242 , G06F12/0607 , G06F13/16

Abstract: Dynamic heterogeneous hashing function technology for balancing memory requests between multiple memory channels is described. A processor includes functional units and multiple memory channels, and a memory controller unit (MCU) coupled between them. The MCU includes a general-purpose hashing function block that defines a default interleaving sequence for memory requests to alternately access the multiple memory channels and multiple specific-purpose hashing function blocks that define different interleaving sequences for the memory requests to alternately access the multiple memory channels. The MCU also includes a hashing-function selection block. The hashing-function selection block is operable to select one of the specific-purpose hashing function blocks or the general-purpose hashing function block for a current memory request in view of a requesting functional unit originating the current memory request.

4.

发明授权
Method and apparatus for performing reduction operations on a plurality of associated data element values 有权

公开(公告)号：US11294670B2

公开(公告)日：2022-04-05

申请号：US16366155

申请日：2019-03-27

Applicant: Intel Corporation

Inventor： Christopher J. Hughes , Jonathan D. Pearce , Guei-Yuan Lueh , ElMoustapha Ould-Ahmed-Vall , Jorge E. Parra , Prasoonkumar Surti , Krishna N. Vinod , Ronen Zohar

IPC: G06F9/30 , G06F9/38

Abstract: Embodiments detailed herein relate to reduction operations on a plurality of data element values. In one embodiment, a process comprises decoding circuitry to decode an instruction and execution circuitry to execute the decoded instruction. The instruction specifies a first input register containing a plurality of data element values, a first index register containing a plurality of indices, and an output register, where each index of the plurality of indices maps to one unique data element position of the first input register. The execution includes to identify data element values that are associated with one another based on the indices, perform one or more reduction operations on the associated data element values based on the identification, and store results of the one or more reduction operations in the output register.

5.

发明授权
Method and apparatus for vector-matrix comparison 有权

公开(公告)号：US10817297B2

公开(公告)日：2020-10-27

申请号：US16370922

申请日：2019-03-30

Applicant: Intel Corporation

Inventor： Christopher J. Hughes , ElMoustapha Ould-Ahmed-Vall , Jorge E. Parra , Prasoonkumar Surti , Krishna N. Vinod , Ronen Zohar

IPC: G06F9/30 , G06F9/38 , G06F9/345

Abstract: Methods and apparatus for vector-matrix comparison are disclosed. In one embodiment, a processor comprises decoding and execution circuitry. The decoding circuitry decodes an instruction, where operands of the instruction specifies an output location to store output results, a vector of data element values, and a matrix of data element values. The execution circuitry executes the decoded instruction. The execution includes to map each of the data element values of the vector to one of consecutive rows of the matrix; for each data element value of the vector, to compare that data element value of the vector with data element values in a respective row of the matrix and obtain data element match results. The execution further includes to store the output results based on the data element match results, where each output result maps to a respective data element column position and indicates a vector match result.

6.

发明授权
Software scoreboard information and synchronization 有权

公开(公告)号：US10360654B1

公开(公告)日：2019-07-23

申请号：US15990328

申请日：2018-05-25

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Supratim Pal , Jorge E. Parra , Chandra S. Gurram , Ashwin J. Shivani , Ashutosh Garg , Brent A. Schwartz , Jorge F. Garcia Pabon , Darin M. Starkey , Shubh B. Shah , Guei-Yuan Lueh , Kaiyu Chen , Konrad Trifunovic , Buqi Cheng , Weiyu Chen

IPC: G06F9/38 , G06F8/41 , G06T1/20 , G06F9/30 , G06T1/60 , G09G5/36 , G06T15/00

Abstract: Embodiments described herein provide a graphics processor in which dependency tracking hardware is simplified via the use of compiler provided software scoreboard information. In one embodiment the shader compiler for shader programs is configured to encode software scoreboard information into each instruction. Dependencies can be evaluated by the shader compiler and provided as scoreboard information with each instruction. The hardware can then use the provided information when scheduling instructions. In one embodiment, a software scoreboard synchronization instruction is provided to facilitate software dependency handling within a shader program. Using software to facilitate software dependency handling and synchronization can simplify hardware design, reducing the area consumed by the hardware. In one embodiment, dependencies can be evaluated by the shader compiler instead of the GPU hardware. The compiler can then insert a software scoreboard sync immediate instruction into compiled program code to manage instruction dependencies and prevent data hazards from occurring.

7.

发明授权
Instructions and logic for vector multiply add with zero skipping 有权

公开(公告)号：US11669329B2

公开(公告)日：2023-06-06

申请号：US17723312

申请日：2022-04-18

Applicant: Intel Corporation

Inventor： Supratim Pal , Sasikanth Avancha , Ishwar Bhati , Wei-Yu Chen , Dipankar Das , Ashutosh Garg , Chandra S. Gurram , Junjie Gu , Guei-Yuan Lueh , Subramaniam Maiyuran , Jorge E. Parra , Sudarshan Srinivasan , Varghese George

IPC: G06F9/38 , G06F9/30

CPC classification number: G06F9/3802 , G06F9/3001 , G06F9/30018 , G06F9/30145

Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.

8.

发明申请
ACCUMULATOR POOLING MECHANISM 审中-公开

公开(公告)号：US20200320662A1

公开(公告)日：2020-10-08

申请号：US16378047

申请日：2019-04-08

Applicant: Intel Corporation

Inventor： Guei-Yuan Lueh , Subramaniam Maiyuran , Wei-Yu Chen , Konrad Trifunovic , Supratim Pal , Chandra S. Gurram , Jorge E. Parra , Pratik J. Ashar , Tomasz Bujewski

IPC: G06T1/60 , G06F9/30 , G06T1/20

Abstract: A processor is disclosed. The processor includes an execution unit having a register file having one or more banks of registers to store operand values, an accumulator comprising a pool of registers to store operand values determined to cause a conflict at register banks within the register file and cache circuitry to control storage of the operand values determined to cause a conflict at the register banks from the register file to the pool of registers.

9.

发明申请
DYNAMIC HETEROGENEOUS HASHING FUNCTIONS IN RANGES OF SYSTEM MEMORY ADDRESSING SPACE 有权
Title translation: 系统记忆空间范围内的动态异质冲击函数

公开(公告)号：US20160330033A1

公开(公告)日：2016-11-10

申请号：US15216317

申请日：2016-07-21

Applicant: INTEL CORPORATION

Inventor： Jorge E. Parra , Joydeep Ray , Ramadass Nagarajan

IPC: H04L9/32

CPC classification number: H04L9/3242 , G06F12/0607 , G06F13/16

Abstract: Dynamic heterogeneous hashing function technology for balancing memory requests between multiple memory channels is described. A processor includes functional units and multiple memory channels, and a memory controller unit (MCU) coupled between them. The MCU includes a general-purpose hashing function block that defines a default interleaving sequence for memory requests to alternately access the multiple memory channels and multiple specific-purpose hashing function blocks that define different interleaving sequences for the memory requests to alternately access the multiple memory channels. The MCU also includes a hashing-function selection block. The hashing-function selection block is operable to select one of the specific-purpose hashing function blocks or the general-purpose hashing function block for a current memory request in view of a requesting functional unit originating the current memory request.

Abstract translation: 描述了用于平衡多个存储器通道之间的存储器请求的动态异构散列函数技术。处理器包括功能单元和多个存储器通道，以及耦合在它们之间的存储器控制器单元（MCU）。 MCU包括通用散列功能块，其定义用于交替访问多个存储器通道的存储器请求的默认交错序列，以及为存储器请求定义不同交错序列以交替访问多个存储器通道的多个特定目的散列功能块。 MCU还包括散列函数选择块。考虑到发起当前存储器请求的请求功能单元，散列函数选择块可操作用于选择用于当前存储器请求的特定目的散列功能块或通用散列功能块之一。

10.

发明授权
Instruction and logic for systolic dot product with accumulate 有权

公开(公告)号：US11640297B2

公开(公告)日：2023-05-02

申请号：US17304153

申请日：2021-06-15

Applicant: Intel Corporation

Inventor： Subramaniam Maiyuran , Guei-Yuan Lueh , Supratim Pal , Ashutosh Garg , Chandra S. Gurram , Jorge E. Parra , Junjie Gu , Konrad Trifunovic , Hong Bin Liao , Mike B. MacPherson , Shubh B. Shah , Shubra Marwaha , Stephen Junkins , Timothy R. Bauer , Varghese George , Weiyu Chen

IPC: G06F9/30 , G06T1/20 , G06F9/38

Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes systolic dot product circuitry to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification