专利检索 ap:("Intel Corporation") AND inv:"Changwon Rhee" 第 1 页

1.

发明公开
HARDWARE ENHANCEMENTS FOR DOUBLE PRECISION SYSTOLIC SUPPORT 审中-公开

公开(公告)号：US20240111826A1

公开(公告)日：2024-04-04

申请号：US17937252

申请日：2022-09-30

申请人： Intel Corporation

发明人： Jiasheng Chen , Kevin Hurd , Changwon Rhee , Jorge Parra , Fangwen Fu , Theo Drane , William Zorn , Peter Caday , Gregory Henry , Guei-Yuan Lueh , Farzad Chehrazi , Amit Karande , Turbo Majumder , Xinmin Tian , Milind Girkar , Hong Jiang

IPC分类号： G06F17/16 , G06F7/544 , G06T1/20

CPC分类号： G06F17/16 , G06F7/5443 , G06T1/20

摘要： An apparatus to facilitate hardware enhancements for double precision systolic support is disclosed. The apparatus includes matrix acceleration hardware having double-precision (DP) matrix multiplication circuitry including a multiplier circuits to multiply pairs of input source operands in a DP floating-point format; adders to receive multiplier outputs from the multiplier circuits and accumulate the multiplier outputs in a high precision intermediate format; an accumulator circuit to accumulate adder outputs from the adders with at least one of a third global source operand on a first pass of the DP matrix multiplication circuitry or an intermediate result from the first pass on a second pass of the DP matrix multiplication circuitry, wherein the accumulator circuit to generate an accumulator output in the high precision intermediate format; and a down conversion and rounding circuit to down convert and round an output of the second pass as final result in the DP floating-point format.

2.

发明申请
FUSED INSTRUCTION TO ACCELERATE PERFORMANCE OF SECURE HASH ALGORITHM 2 (SHA-2) WORKLOADS IN A GRAPHICS ENVIRONMENT 有权

公开(公告)号：US20220416999A1

公开(公告)日：2022-12-29

申请号：US17358897

申请日：2021-06-25

申请人： Intel Corporation

发明人： Supratim Pal , Wajdi Feghali , Changwon Rhee , Wei-Yu Chen , Timothy R. Bauer , Alexander Lyashevsky

IPC分类号： H04L9/06 , G06F9/38 , G06T15/00

摘要： An apparatus to facilitate a fused instruction to accelerate performance of secure hash algorithm 2 (SHA-2) in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising execution circuitry to receive a fused SHA instruction identifying a length corresponding to a data size of the fused SHA instruction and a functional control identifying an operation type of the fused SHA instruction; based on decoding the fused SHA instruction, cause a sub-function identified by the length and the function control to be scheduled to an integer pipeline of the execution resource; and execute the sub-function of the fused SHA instruction in an integer pipeline of the execution circuitry, the sub-function to perform merged operations on a source operand of the fused SHA instruction, the merged operations comprising a rotate operation, a shift operation, and an xor operation.

3.

发明授权
Methods and apparatuses for dynamically changing data priority in a cache 有权

公开(公告)号：US12066946B2

公开(公告)日：2024-08-20

申请号：US17704340

申请日：2022-03-25

申请人： Intel Corporation

发明人： Xiaodong Qiu , Yong Jiang , Changwon Rhee , Cui Tang , Shuangpeng Zhou , Lei Chen , Danyu Bi , Peiqing Jiang , Chengxi Wu

IPC分类号： G06F12/084 , G06F9/48

CPC分类号： G06F12/084 , G06F9/4818 , G06F2212/604

摘要： Embodiments are generally directed to methods and apparatuses for dynamically changing data priority in a cache. An embodiment of an apparatus comprising: a priority controller to: receive a memory access request to request data; and set a priority flag for the memory access request based on an accumulated access amount of data stored in a memory block to be accessed by the memory access request to dynamically change a priority level of the requested data.

4.

发明公开
SUPPORTING VECTOR MULTIPLY ADD WITH DOUBLE ACCUMULATOR ACCESS IN A GRAPHICS ENVIRONMENT 审中-公开

公开(公告)号：US20240103810A1

公开(公告)日：2024-03-28

申请号：US17935787

申请日：2022-09-27

申请人： Intel Corporation

发明人： Jiasheng Chen , Supratim Pal , Changwon Rhee , Hong Jiang , Kevin Hurd , Shuai Mu

IPC分类号： G06F7/544 , G06F7/57 , G06F17/16

CPC分类号： G06F7/5443 , G06F7/57 , G06F17/16

摘要： An apparatus to facilitate supporting vector multiply add with double accumulator access in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a matrix multiplication operation, wherein the operands comprising two source matrices to be multiplied as part of the matrix multiplication operation; and issue a multiply and add vector (MADV) instruction for the multiplication operation utilizing a double accumulator access output, wherein the MADV instruction to multiply two vectors of the two source matrices in a single floating point (FP) pipeline of the processor.

5.

发明公开
BROADCAST ASYNCHRONOUS LOADS TO SHARED LOCAL MEMORY 审中-公开

公开(公告)号：US20240232088A9

公开(公告)日：2024-07-11

申请号：US17973203

申请日：2022-10-25

申请人： Intel Corporation

发明人： John A. Wiegert , Joydeep Ray , Vasanth Ranganathan , Biju George , Fangwen Fu , Abhishek R. Appu , Chunhui Mei , Changwon Rhee

IPC分类号： G06F12/0855

CPC分类号： G06F12/0857 , G06F2212/1016

摘要： Embodiments described herein provide a technique to facilitate the broadcast or multicast of asynchronous loads to shared local memory of a plurality of graphics cores within a graphics core cluster. One embodiment provides a graphics processor including a cache memory a graphics core cluster coupled with the cache memory. The graphics core cluster includes a plurality of graphics cores. The plurality of graphics cores includes a graphics core configured to receive a designation as a producer graphics core for a multicast load, read data from the cache memory; and transmit the data read from the cache memory to a consumer graphics core of the plurality of graphics cores.

6.

发明公开
BROADCAST ASYNCHRONOUS LOADS TO SHARED LOCAL MEMORY 审中-公开

公开(公告)号：US20240134797A1

公开(公告)日：2024-04-25

申请号：US17973203

申请日：2022-10-24

申请人： Intel Corporation

发明人： John A. Wiegert , Joydeep Ray , Vasanth Ranganathan , Biju George , Fangwen Fu , Abhishek R. Appu , Chunhui Mei , Changwon Rhee

IPC分类号： G06F12/0855

CPC分类号： G06F12/0857 , G06F2212/1016

摘要： Embodiments described herein provide a technique to facilitate the broadcast or multicast of asynchronous loads to shared local memory of a plurality of graphics cores within a graphics core cluster. One embodiment provides a graphics processor including a cache memory a graphics core cluster coupled with the cache memory. The graphics core cluster includes a plurality of graphics cores. The plurality of graphics cores includes a graphics core configured to receive a designation as a producer graphics core for a multicast load, read data from the cache memory; and transmit the data read from the cache memory to a consumer graphics core of the plurality of graphics cores.

7.

发明公开
SINGLE PRECISION SUPPORT FOR SYSTOLIC PIPELINE IN A GRAPHICS ENVIRONMENT 审中-公开

公开(公告)号：US20240111825A1

公开(公告)日：2024-04-04

申请号：US17937229

申请日：2022-09-30

申请人： Intel Corporation

发明人： Jiasheng Chen , Changwon Rhee , Kevin Hurd , Gregory Henry , Peter Caday , Kristopher Wong

IPC分类号： G06F17/16 , G06F7/483

CPC分类号： G06F17/16 , G06F7/483

摘要： An apparatus to facilitate single precision support for systolic pipeline in a graphics environment is disclosed. The apparatus includes a processor comprising systolic array hardware including a plurality of data processing units, wherein the systolic array hardware is to: receive data for performance of a matrix multiplication operation in a first precision format; convert an original value of the data into two split values with a second precision format having a lower precision than the first precision format; perform the matrix multiplication operation using the two split values in the second precision format, the matrix multiplication operation comprising a split-term operation that utilizes two passes through the systolic array hardware with feedback wiring and local reduction; and generate an emulated result for the matrix multiplication operation in the first precision format.

8.

发明申请
EMULATION OF FLOATING POINT CALCULATION 有权

公开(公告)号：US20230086275A1

公开(公告)日：2023-03-23

申请号：US17482166

申请日：2021-09-22

申请人： Intel Corporation

发明人： Jiasheng Chen , Changwon Rhee , Sabareesh Ganapathy , Gregory Henry , Fangwen Fu

IPC分类号： G06F7/487 , G06F7/485 , G06F7/544 , G06F17/16 , G06F15/80

摘要： Emulating floating point calculation using lower precision format calculations is described. An example of a processor includes a floating point unit (FPU) to provide a native floating point operation in a first precision format; and systolic array hardware including multiple data processing units, wherein the processor is to receive data for performance of a matrix multiplication operation in the first precision format; enable an emulated floating point multiplication operation using one or more values with a second precision format, the second precision format having a lower precision than the first precision format, the emulated floating point multiplication including operation of the systolic array hardware; and generate an emulated result for the matrix multiplication operation.

9.

发明申请
METHODS AND APPARATUSES FOR DYNAMICALLY CHANGING DATA PRIORITY IN A CACHE 有权

公开(公告)号：US20220414010A1

公开(公告)日：2022-12-29

申请号：US17704340

申请日：2022-03-25

申请人： Intel Corporation

发明人： Xiaodong Qiu , Yong Jiang , Changwon Rhee , Cui Tang , Shuangpeng Zhou , Lei Chen , Danyu Bi , Peiqing Jiang , Chengxi Wu

IPC分类号： G06F12/084 , G06F9/48

摘要： Embodiments are generally directed to methods and apparatuses for dynamically changing data priority in a cache. An embodiment of an apparatus comprising: a priority controller to: receive a memory access request to request data; and set a priority flag for the memory access request based on an accumulated access amount of data stored in a memory block to be accessed by the memory access request to dynamically change a priority level of the requested data.

10.

发明申请
64-BIT TWO-DIMENSIONAL BLOCK LOAD WITH TRANSPOSE 有权

公开(公告)号：US20220413854A1

公开(公告)日：2022-12-29

申请号：US17358859

申请日：2021-06-25

申请人： Intel Corporation

发明人： Joydeep Ray , Supratim Pal , Prathamesh Raghunath Shinde , Ben J. Ashbaugh , Changwon Rhee , Hong Jiang , FangWen Fu

IPC分类号： G06F9/30 , G06F9/38 , G06T15/00

摘要： An apparatus to facilitate 64-bit two-dimensional (2D) block load with transpose is disclosed. The apparatus includes a processor comprising processing resources; and load store pipeline hardware circuitry coupled to the processing resources, the load store pipeline hardware circuitry to receive a 64-bit two-dimensional (2D) block load message with transpose from the processing resources. The load store pipeline hardware circuitry comprising a load store pipeline sequencer to map rows of a block of memory corresponding to the 64-bit 2D block load message with transpose to 64-bit standard load messages; and load store pipeline return circuitry to: sequentially number general register files (GRFs) used for returning elements of the block of memory accessed by the 64-bit standard load messages to the processing resources; and return, to the processing resources, the sequentially numbered GRFs in response to the 64-bit 2D block load message with transpose.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类