-
公开(公告)号:US20240111826A1
公开(公告)日:2024-04-04
申请号:US17937252
申请日:2022-09-30
申请人: Intel Corporation
发明人: Jiasheng Chen , Kevin Hurd , Changwon Rhee , Jorge Parra , Fangwen Fu , Theo Drane , William Zorn , Peter Caday , Gregory Henry , Guei-Yuan Lueh , Farzad Chehrazi , Amit Karande , Turbo Majumder , Xinmin Tian , Milind Girkar , Hong Jiang
CPC分类号: G06F17/16 , G06F7/5443 , G06T1/20
摘要: An apparatus to facilitate hardware enhancements for double precision systolic support is disclosed. The apparatus includes matrix acceleration hardware having double-precision (DP) matrix multiplication circuitry including a multiplier circuits to multiply pairs of input source operands in a DP floating-point format; adders to receive multiplier outputs from the multiplier circuits and accumulate the multiplier outputs in a high precision intermediate format; an accumulator circuit to accumulate adder outputs from the adders with at least one of a third global source operand on a first pass of the DP matrix multiplication circuitry or an intermediate result from the first pass on a second pass of the DP matrix multiplication circuitry, wherein the accumulator circuit to generate an accumulator output in the high precision intermediate format; and a down conversion and rounding circuit to down convert and round an output of the second pass as final result in the DP floating-point format.
-
公开(公告)号:US20220416999A1
公开(公告)日:2022-12-29
申请号:US17358897
申请日:2021-06-25
申请人: Intel Corporation
发明人: Supratim Pal , Wajdi Feghali , Changwon Rhee , Wei-Yu Chen , Timothy R. Bauer , Alexander Lyashevsky
摘要: An apparatus to facilitate a fused instruction to accelerate performance of secure hash algorithm 2 (SHA-2) in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising execution circuitry to receive a fused SHA instruction identifying a length corresponding to a data size of the fused SHA instruction and a functional control identifying an operation type of the fused SHA instruction; based on decoding the fused SHA instruction, cause a sub-function identified by the length and the function control to be scheduled to an integer pipeline of the execution resource; and execute the sub-function of the fused SHA instruction in an integer pipeline of the execution circuitry, the sub-function to perform merged operations on a source operand of the fused SHA instruction, the merged operations comprising a rotate operation, a shift operation, and an xor operation.
-
公开(公告)号:US12066946B2
公开(公告)日:2024-08-20
申请号:US17704340
申请日:2022-03-25
申请人: Intel Corporation
发明人: Xiaodong Qiu , Yong Jiang , Changwon Rhee , Cui Tang , Shuangpeng Zhou , Lei Chen , Danyu Bi , Peiqing Jiang , Chengxi Wu
IPC分类号: G06F12/084 , G06F9/48
CPC分类号: G06F12/084 , G06F9/4818 , G06F2212/604
摘要: Embodiments are generally directed to methods and apparatuses for dynamically changing data priority in a cache. An embodiment of an apparatus comprising: a priority controller to: receive a memory access request to request data; and set a priority flag for the memory access request based on an accumulated access amount of data stored in a memory block to be accessed by the memory access request to dynamically change a priority level of the requested data.
-
4.
公开(公告)号:US20240103810A1
公开(公告)日:2024-03-28
申请号:US17935787
申请日:2022-09-27
申请人: Intel Corporation
发明人: Jiasheng Chen , Supratim Pal , Changwon Rhee , Hong Jiang , Kevin Hurd , Shuai Mu
CPC分类号: G06F7/5443 , G06F7/57 , G06F17/16
摘要: An apparatus to facilitate supporting vector multiply add with double accumulator access in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a matrix multiplication operation, wherein the operands comprising two source matrices to be multiplied as part of the matrix multiplication operation; and issue a multiply and add vector (MADV) instruction for the multiplication operation utilizing a double accumulator access output, wherein the MADV instruction to multiply two vectors of the two source matrices in a single floating point (FP) pipeline of the processor.
-
公开(公告)号:US20240232088A9
公开(公告)日:2024-07-11
申请号:US17973203
申请日:2022-10-25
申请人: Intel Corporation
发明人: John A. Wiegert , Joydeep Ray , Vasanth Ranganathan , Biju George , Fangwen Fu , Abhishek R. Appu , Chunhui Mei , Changwon Rhee
IPC分类号: G06F12/0855
CPC分类号: G06F12/0857 , G06F2212/1016
摘要: Embodiments described herein provide a technique to facilitate the broadcast or multicast of asynchronous loads to shared local memory of a plurality of graphics cores within a graphics core cluster. One embodiment provides a graphics processor including a cache memory a graphics core cluster coupled with the cache memory. The graphics core cluster includes a plurality of graphics cores. The plurality of graphics cores includes a graphics core configured to receive a designation as a producer graphics core for a multicast load, read data from the cache memory; and transmit the data read from the cache memory to a consumer graphics core of the plurality of graphics cores.
-
公开(公告)号:US20240134797A1
公开(公告)日:2024-04-25
申请号:US17973203
申请日:2022-10-24
申请人: Intel Corporation
发明人: John A. Wiegert , Joydeep Ray , Vasanth Ranganathan , Biju George , Fangwen Fu , Abhishek R. Appu , Chunhui Mei , Changwon Rhee
IPC分类号: G06F12/0855
CPC分类号: G06F12/0857 , G06F2212/1016
摘要: Embodiments described herein provide a technique to facilitate the broadcast or multicast of asynchronous loads to shared local memory of a plurality of graphics cores within a graphics core cluster. One embodiment provides a graphics processor including a cache memory a graphics core cluster coupled with the cache memory. The graphics core cluster includes a plurality of graphics cores. The plurality of graphics cores includes a graphics core configured to receive a designation as a producer graphics core for a multicast load, read data from the cache memory; and transmit the data read from the cache memory to a consumer graphics core of the plurality of graphics cores.
-
公开(公告)号:US20240111825A1
公开(公告)日:2024-04-04
申请号:US17937229
申请日:2022-09-30
申请人: Intel Corporation
发明人: Jiasheng Chen , Changwon Rhee , Kevin Hurd , Gregory Henry , Peter Caday , Kristopher Wong
摘要: An apparatus to facilitate single precision support for systolic pipeline in a graphics environment is disclosed. The apparatus includes a processor comprising systolic array hardware including a plurality of data processing units, wherein the systolic array hardware is to: receive data for performance of a matrix multiplication operation in a first precision format; convert an original value of the data into two split values with a second precision format having a lower precision than the first precision format; perform the matrix multiplication operation using the two split values in the second precision format, the matrix multiplication operation comprising a split-term operation that utilizes two passes through the systolic array hardware with feedback wiring and local reduction; and generate an emulated result for the matrix multiplication operation in the first precision format.
-
公开(公告)号:US20230086275A1
公开(公告)日:2023-03-23
申请号:US17482166
申请日:2021-09-22
申请人: Intel Corporation
发明人: Jiasheng Chen , Changwon Rhee , Sabareesh Ganapathy , Gregory Henry , Fangwen Fu
摘要: Emulating floating point calculation using lower precision format calculations is described. An example of a processor includes a floating point unit (FPU) to provide a native floating point operation in a first precision format; and systolic array hardware including multiple data processing units, wherein the processor is to receive data for performance of a matrix multiplication operation in the first precision format; enable an emulated floating point multiplication operation using one or more values with a second precision format, the second precision format having a lower precision than the first precision format, the emulated floating point multiplication including operation of the systolic array hardware; and generate an emulated result for the matrix multiplication operation.
-
公开(公告)号:US20220414010A1
公开(公告)日:2022-12-29
申请号:US17704340
申请日:2022-03-25
申请人: Intel Corporation
发明人: Xiaodong Qiu , Yong Jiang , Changwon Rhee , Cui Tang , Shuangpeng Zhou , Lei Chen , Danyu Bi , Peiqing Jiang , Chengxi Wu
IPC分类号: G06F12/084 , G06F9/48
摘要: Embodiments are generally directed to methods and apparatuses for dynamically changing data priority in a cache. An embodiment of an apparatus comprising: a priority controller to: receive a memory access request to request data; and set a priority flag for the memory access request based on an accumulated access amount of data stored in a memory block to be accessed by the memory access request to dynamically change a priority level of the requested data.
-
公开(公告)号:US20220413854A1
公开(公告)日:2022-12-29
申请号:US17358859
申请日:2021-06-25
申请人: Intel Corporation
发明人: Joydeep Ray , Supratim Pal , Prathamesh Raghunath Shinde , Ben J. Ashbaugh , Changwon Rhee , Hong Jiang , FangWen Fu
摘要: An apparatus to facilitate 64-bit two-dimensional (2D) block load with transpose is disclosed. The apparatus includes a processor comprising processing resources; and load store pipeline hardware circuitry coupled to the processing resources, the load store pipeline hardware circuitry to receive a 64-bit two-dimensional (2D) block load message with transpose from the processing resources. The load store pipeline hardware circuitry comprising a load store pipeline sequencer to map rows of a block of memory corresponding to the 64-bit 2D block load message with transpose to 64-bit standard load messages; and load store pipeline return circuitry to: sequentially number general register files (GRFs) used for returning elements of the block of memory accessed by the 64-bit standard load messages to the processing resources; and return, to the processing resources, the sequentially numbered GRFs in response to the 64-bit 2D block load message with transpose.
-
-
-
-
-
-
-
-
-