Patent search ap:("Intel Corporation") AND inv:"Karthik Raman" Page 1

1.

发明申请
INSTRUCTIONS FOR DUAL DESTINATION TYPE CONVERSION, MIXED PRECISION ACCUMULATION, AND MIXED PRECISION ATOMIC MEMORY OPERATIONS 审中-公开

公开(公告)号：US20180321937A1

公开(公告)日：2018-11-08

申请号：US15586032

申请日：2017-05-03

Applicant: Intel Corporation

Inventor： William M. Brown , Karthik Raman

IPC: G06F9/30 , G06F9/38

CPC classification number: G06F9/30014 , G06F9/30025 , G06F9/30036 , G06F9/3016 , G06F9/3802 , G06F9/3887

Abstract: Disclosed embodiments relate to instructions for dual-destination type conversion, accumulation, and atomic memory operations. In one example, a system includes a memory, a processor including: a fetch circuit to fetch the instruction from a code storage, the instruction including an opcode, a first destination identifier, and a source identifier to specify a source vector register, the source vector register including a plurality of single precision floating point data elements, a decode circuit to decode the fetched instruction, and an execution circuit to execute the decoded instruction to: convert the elements of the source vector register into double precision floating point values, store a first half of the double precision floating point values to a first location identified by the first destination identifier, and store a second half of the double precision floating point values to a second location.

2.

发明授权
Apparatus and method for loop flattening and reduction in a single instruction multiple data (SIMD) pipeline 有权

公开(公告)号：US12079628B2

公开(公告)日：2024-09-03

申请号：US17493667

申请日：2021-10-04

Applicant: Intel Corporation

Inventor： William M. Brown , Roland Schulz , Karthik Raman

IPC: G06F9/30 , G06F9/38

CPC classification number: G06F9/30032 , G06F9/30018 , G06F9/30036 , G06F9/30098 , G06F9/30145 , G06F9/3887

Abstract: An apparatus and method for loop flattening and reduction in a SIMD pipeline including broadcast, move, and reduction instructions. For example, one embodiment of a processor comprises: a decoder to decode a broadcast instruction to generate a decoded broadcast instruction identifying a plurality of operations, the broadcast instruction including an opcode, first and second source operands, and at least one destination operand, the broadcast instruction having a split value associated therewith; a first source register associated with the first source operand to store a first plurality of packed data elements; a second source register associated with the second source operand to store a second plurality of packed data elements; execution circuitry to execute the operations of the decoded broadcast instruction, the execution circuitry to copy a first number of contiguous data elements from the first source register to a first set of contiguous data element locations in a destination register specified by the destination operand, the execution circuitry to further copy a second number of contiguous data elements from the second source register to a second set of contiguous data element locations in the destination register, wherein the execution circuitry is to determine the first number and the second number in accordance with the split value associated with the broadcast instruction.

3.

发明授权
Apparatus and method for loop flattening and reduction in a single instruction multiple data (SIMD) pipeline 有权

公开(公告)号：US10409601B2

公开(公告)日：2019-09-10

申请号：US15859046

申请日：2017-12-29

Applicant: Intel Corporation

Inventor： William M. Brown , Roland Schulz , Karthik Raman

IPC: G06F9/30 , G06F9/38

Abstract: An apparatus and method for loop flattening and reduction in a SIMD pipeline including broadcast, move, and reduction instructions. For example, one embodiment of a processor comprises: a decoder to decode a broadcast instruction to generate a decoded broadcast instruction identifying a plurality of operations, the broadcast instruction including an opcode, first and second source operands, and at least one destination operand, the broadcast instruction having a split value associated therewith; a first source register associated with the first source operand to store a first plurality of packed data elements; a second source register associated with the second source operand to store a second plurality of packed data elements; execution circuitry to execute the operations of the decoded broadcast instruction, the execution circuitry to copy a first number of contiguous data elements from the first source register to a first set of contiguous data element locations in a destination register specified by the destination operand, the execution circuitry to further copy a second number of contiguous data elements from the second source register to a second set of contiguous data element locations in the destination register, wherein the execution circuitry is to determine the first number and the second number in accordance with the split value associated with the broadcast instruction.

4.

发明授权
Apparatus and method for instruction-based flop accounting 有权

公开(公告)号：US10228938B2

公开(公告)日：2019-03-12

申请号：US15396345

申请日：2016-12-30

Applicant: Intel Corporation

Inventor： Karthik Raman , Ariel Slonim , Ady Tal

IPC: G06F9/30 , G06F11/30

Abstract: An apparatus and method are described for floating point operation (FLOP) accounting. For example, one embodiment of a processor comprises: an instruction fetch unit to fetch instructions from system memory, the instructions including at least one masked vector floating point instruction to perform operations on a plurality of floating point data elements; a mask register to store a mask value associated with the masked vector floating point instruction; a decoder to decode the masked vector floating point instruction; and floating point operations (FLOP) accounting circuitry to read the mask register to determine a number of floating point operations to be performed during execution of the masked vector floating point instruction.

5.

发明授权
Instruction and logic for adaptive dataset priorities in processor caches 有权
Title translation: 处理器缓存中自适应数据集优先级的指令和逻辑

公开(公告)号：US09405706B2

公开(公告)日：2016-08-02

申请号：US14496255

申请日：2014-09-25

Applicant: Intel Corporation

Inventor： Kshitij A. Doshi , Karthik Raman , Christopher J. Hughes

IPC: G06F12/12 , G06F12/08

CPC classification number: G06F12/126 , G06F9/30047 , G06F9/3836 , G06F9/3877 , G06F9/3887 , G06F9/3889 , G06F12/0804 , G06F12/0866 , G06F12/0875 , G06F12/0897 , G06F2212/452

Abstract: A processor includes a front end, a cache, and a cache controller. The front end includes logic to receive an instruction defining a priority dataset. The priority dataset includes ranges of memory addresses each corresponding to a respective priority level. The cache controller includes logic to detect a miss in the cache for a requested cache value, determine a candidate cache victim from the cache, determine a priority of the requested cache value and the candidate cache victim according to the priority dataset, and evict the candidate cache victim based on a determination that the priority of the candidate cache victim is less or equal to the priority of the requested cache value.

Abstract translation: 处理器包括前端，高速缓存和高速缓存控制器。前端包括接收定义优先级数据集的指令的逻辑。优先级数据集包括各自对应于相应优先级的存储器地址范围。高速缓存控制器包括用于检测所请求的高速缓存值的高速缓存中的未命中的逻辑，从高速缓存确定候选高速缓存受害者，根据优先级数据确定所请求的高速缓存值和候选高速缓存受害者的优先级，基于所述候选高速缓存牺牲者的优先级小于或等于所请求的高速缓存值的优先级的确定来确定缓存受害者。

6.

发明申请
INSTRUCTION AND LOGIC FOR ADAPTIVE DATASET PRIORITIES IN PROCESSOR CACHES 有权
Title translation: 处理器缓存中的自适应数据库优先级的指令和逻辑

公开(公告)号：US20160092373A1

公开(公告)日：2016-03-31

申请号：US14496255

申请日：2014-09-25

Applicant: Intel Corporation

Inventor： Kshitij A. Doshi , Karthik Raman , Christopher J. Hughes

IPC: G06F12/12 , G06F12/08

CPC classification number: G06F12/126 , G06F9/30047 , G06F9/3836 , G06F9/3877 , G06F9/3887 , G06F9/3889 , G06F12/0804 , G06F12/0866 , G06F12/0875 , G06F12/0897 , G06F2212/452

Abstract: A processor includes a front end, a cache, and a cache controller. The front end includes logic to receive an instruction defining a priority dataset. The priority dataset includes ranges of memory addresses each corresponding to a respective priority level. The cache controller includes logic to detect a miss in the cache for a requested cache value, determine a candidate cache victim from the cache, determine a priority of the requested cache value and the candidate cache victim according to the priority dataset, and evict the candidate cache victim based on a determination that the priority of the candidate cache victim is less or equal to the priority of the requested cache value.

Abstract translation: 处理器包括前端，高速缓存和高速缓存控制器。前端包括接收定义优先级数据集的指令的逻辑。优先级数据集包括各自对应于相应优先级的存储器地址范围。高速缓存控制器包括用于检测所请求的高速缓存值的高速缓存中的未命中的逻辑，从高速缓存确定候选高速缓存受害者，根据优先级数据确定所请求的高速缓存值和候选高速缓存受害者的优先级，基于所述候选高速缓存牺牲者的优先级小于或等于所请求的高速缓存值的优先级的确定来确定缓存受害者。

7.

发明授权
Instructions for dual destination type conversion, mixed precision accumulation, and mixed precision atomic memory operations 有权

公开(公告)号：US10698685B2

公开(公告)日：2020-06-30

申请号：US15586032

申请日：2017-05-03

Applicant: Intel Corporation

Inventor： William M. Brown , Karthik Raman

IPC: G06F9/38 , G06F9/30

Abstract: Disclosed embodiments relate to instructions for dual-destination type conversion, accumulation, and atomic memory operations. In one example, a system includes a memory, a processor including: a fetch circuit to fetch the instruction from a code storage, the instruction including an opcode, a first destination identifier, and a source identifier to specify a source vector register, the source vector register including a plurality of single precision floating point data elements, a decode circuit to decode the fetched instruction, and an execution circuit to execute the decoded instruction to: convert the elements of the source vector register into double precision floating point values, store a first half of the double precision floating point values to a first location identified by the first destination identifier, and store a second half of the double precision floating point values to a second location.

8.

发明申请
POST-COMPILE CACHE BLOCKING ANALYZER 审中-公开

公开(公告)号：US20190042225A1

公开(公告)日：2019-02-07

申请号：US15921813

申请日：2018-03-15

Applicant: Intel Corporation

Inventor： Ruchira Sasanka , Karthik Raman , Konstantinos Krommydas

IPC: G06F8/41 , G06F9/38

Abstract: An embodiment of a semiconductor package apparatus may include technology to identify a nested loop in a set of executable instructions, and determine at runtime if the nested loop is a candidate for cache blocking. Other embodiments are disclosed and claimed.

9.

发明授权
Apparatus and method for loop flattening and reduction in a single instruction multiple data (SIMD) pipeline 有权

公开(公告)号：US11138008B2

公开(公告)日：2021-10-05

申请号：US16554169

申请日：2019-08-28

Applicant: Intel Corporation

Inventor： William M. Brown , Roland Schulz , Karthik Raman

IPC: G06F9/30 , G06F9/38

Abstract: An apparatus and method for loop flattening and reduction in a SIMD pipeline including broadcast, move, and reduction instructions. One embodiment of a processor comprises: a decoder to decode a broadcast instruction to generate a decoded broadcast instruction identifying a plurality of operations, the broadcast instruction including an opcode and first and second source operands, and having a split value associated therewith; and execution circuitry to execute the operations of the decoded broadcast instruction to copy a first data element specified by the first source operand to each of a first set of contiguous data element locations in a destination register and to copy a second data element specified by the second source operand to a second set of contiguous data element locations in the destination register, wherein the first and second sets of contiguous data element locations are determined in accordance with the split value.

10.

发明授权
Post-compile cache blocking analyzer 有权

公开(公告)号：US10684833B2

公开(公告)日：2020-06-16

申请号：US15921813

申请日：2018-03-15

Applicant: Intel Corporation

Inventor： Ruchira Sasanka , Karthik Raman , Konstantinos Krommydas

IPC: G06F8/41 , G06F9/38

Abstract: An embodiment of a semiconductor package apparatus may include technology to identify a nested loop in a set of executable instructions, and determine at runtime if the nested loop is a candidate for cache blocking. Other embodiments are disclosed and claimed.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification