-
公开(公告)号:EP4064040A1
公开(公告)日:2022-09-28
申请号:EP22153430.8
申请日:2022-01-26
Applicant: Intel Corporation
Inventor: Mellempudi, Naveen , Maiyuran, Subramaniam , George, Varghese , Fu, Fangwen , Mu, Shuai , Pal, Supratim , Xiong, Wei
Abstract: An apparatus to facilitate supporting 8-bit floating point format operands in a computing architecture is disclosed. The apparatus includes a processor comprising: a decoder to decode an instruction fetched for execution into a decoded instruction, wherein the decoded instruction is a matrix instruction that operates on 8-bit floating point operands to cause the processor to perform a parallel dot product operation; a controller to schedule the decoded instruction and provide input data for the 8-bit floating point operands in accordance with an 8-bit floating data format indicated by the decoded instruction; and systolic dot product circuitry to execute the decoded instruction using systolic layers, each systolic layer comprises one or more sets of interconnected multipliers, shifters, and adder, each set of multipliers, shifters, and adders to generate a dot product of the 8-bit floating point operands.
-
2.
公开(公告)号:EP3974968A1
公开(公告)日:2022-03-30
申请号:EP21192702.5
申请日:2020-03-14
Applicant: Intel Corporation
Inventor: Maiyuran, Subramaniam , Marwaha, Shubra , Garg, Ashutosh , Pal, Supratim , Parra, Jorge , Gurram, Chandra , George, Varghese , Starkey, Darin , Lueh, Guei-Yuan
Abstract: Graphics processors and graphics processing units having dot product accumulate instructions for a hybrid floating point format are disclosed. In one embodiment, a graphics processing unit comprises: a single instruction, multiple thread (SIMT) multiprocessor. The SIMT multiprocessor comprises: an instruction cache; a shared memory coupled with the instruction cache; and circuitry coupled with the shared memory and the instruction cache. The circuitry includes: multiple texture units; a first core including hardware to accelerate matrix operations; and a second core. The second core is configured to: receive an instruction having multiple operands in a bfloat16, BF16, number format, wherein the multiple operands include a first source operand, a second source operand, and a third source operand, and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent; and process the instruction, wherein to process the instruction includes to multiply the second source operand by the third source operand and add a first source operand to a result of the multiply.
-
公开(公告)号:EP3964969A1
公开(公告)日:2022-03-09
申请号:EP21204670.0
申请日:2020-01-23
Applicant: INTEL Corporation
Inventor: Matam, Naveen , Cheney, Lance , Finley, Eric , George, Varghese , Jahagirdar, Sanjeev , Koker, Altug , Mastronarde, Josh , Rajwani, Iqbal , Striramassarma, Lakshminarayanan , Teshome, Melaku , Vemulapalli, Vikranth , Xavier, Binoj
IPC: G06F13/40 , H01L25/11 , H01L25/065 , H01L25/18
Abstract: The present disclosure provides an apparatus comprising a package assembly comprising a plurality of chiplets and a plurality of interconnect structures. The plurality of chiplets including a first chiplet comprising a first base chiplet coupled to a bridge interconnect and an interconnect structure. The first base chiplet including an interconnect fabric, and a first plurality of level 3 cache banks to cache data read from and transmitted to a memory, a second chiplet comprising a second base chiplet, the second chiplet coupled to the first chiplet over the bridge interconnect; and a third chiplet including a second plurality of level 3 cache banks, the third chiplet stacked on the first base chiplet in a 3D arrangement and coupled to the first base chiplet over the interconnect structure.
-
公开(公告)号:EP4485181A2
公开(公告)日:2025-01-01
申请号:EP24205439.3
申请日:2022-01-26
Applicant: INTEL Corporation
Inventor: Mellempudi, Naveen , Maiyuran, Subramaniam , George, Varghese , Fu, Fangwen , Mu, Shuai , Pal, Supratim , Xiong, Wei
IPC: G06F9/38
Abstract: An apparatus comprises decode circuitry to decode a single matrix instruction having fields to indicate an opcode and locations of a first source matrix including a first plurality of 8-bit floating point data elements encoded in a first 8-bit floating point format, a second source matrix including a second plurality of 8-bit floating point data elements encoded in a second 8-bit floating point format, and a third source matrix including a plurality of 32-bit floating point data elements. The apparatus further comprises execution circuitry, responsive to the single matrix instruction, to generate a plurality of products based on the first plurality of 8-bit floating point data elements of the first source matrix and the second plurality of 8-bit floating point data elements of the second source matrix, and accumulate each product of the plurality of products with a corresponding 32-bit floating point data element of the third source matrix to generate a corresponding 32-bit floating point result data element of a result matrix.
-
公开(公告)号:EP4328971A3
公开(公告)日:2024-05-15
申请号:EP24150728.4
申请日:2020-01-23
Applicant: INTEL Corporation
Inventor: Matam, Naveen , Cheney, Lance , Finley, Eric , George, Varghese , Jahagirdar, Sanjeev , Koker, Altug , Mastronarde, Josh , Rajwani, Iqbal , Striramassarma, Lakshminarayanan , Teshome, Melaku , Vemulapalli, Vikranth , Xavier, Binoj
IPC: G06F13/40 , H01L25/11 , H01L25/065 , H01L25/18
CPC classification number: G06F13/4068 , G06F13/409 , H01L2224/1622720130101 , H01L2924/1531120130101 , H01L25/0655 , H01L25/18 , H01L2924/1519220130101 , Y02D10/00
Abstract: The present disclosure provides an apparatus comprising a package assembly that includes a first base chiplet, a first logic chiplet stacked on the first base chiplet, a first interconnect structure to couple the cluster of compute units to the first interconnect fabric, a second base chiplet coupled to the first base chiplet by a second interconnect structure, a second logic chiplet stacked on the second base chiplet, and a third interconnect structure to couple the second logic chiplet to the second interconnect fabric. In the provided apparatus, the first logic chiplet is manufactured using a different process technology than that used to manufacture the first and second base chiplets.
-
公开(公告)号:EP4328971A2
公开(公告)日:2024-02-28
申请号:EP24150728.4
申请日:2020-01-23
Applicant: INTEL Corporation
Inventor: Matam, Naveen , Cheney, Lance , Finley, Eric , George, Varghese , Jahagirdar, Sanjeev , Koker, Altug , Mastronarde, Josh , Rajwani, Iqbal , Striramassarma, Lakshminarayanan , Teshome, Melaku , Vemulapalli, Vikranth , Xavier, Binoj
IPC: H01L25/11
Abstract: The present disclosure provides an apparatus comprising a package assembly that includes a first base chiplet, a first logic chiplet stacked on the first base chiplet, a first interconnect structure to couple the cluster of compute units to the first interconnect fabric, a second base chiplet coupled to the first base chiplet by a second interconnect structure, a second logic chiplet stacked on the second base chiplet, and a third interconnect structure to couple the second logic chiplet to the second interconnect fabric. In the provided apparatus, the first logic chiplet is manufactured using a different process technology than that used to manufacture the first and second base chiplets.
-
公开(公告)号:EP4130988A1
公开(公告)日:2023-02-08
申请号:EP22198615.1
申请日:2020-03-14
Applicant: INTEL Corporation
Inventor: Koker, Altug , Ray, Joydeep , Ould-Ahmed-Vall, ElMoustapha , Appu, Abhishek , Anantaraman, Aravindh , Andrei, Valentin , Bilagi, Durgaprasad , George, Varghese , Insko, Brent , Jahagirdar, Sanjeev , Janus, Scott , K, Pattabhiraman , Kim, SungYe , Maiyuran, Subramaniam , Ranganathan, Vasanth , Striramassarma, Lakshminarayanan , Tian, Xinmin
IPC: G06F9/38 , G06F12/0862 , G06F9/30 , G06F12/0891
Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor comprises processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to, in response to an instruction executed by one of the processing resources, modify an aging policy by modifying, based on the instruction, a level of importance from a first level of importance to preserve data longer in the cache for a first time period to a second level of importance for data to be evicted from the cache within a second time period, which is less than the first time period.
-
公开(公告)号:EP4024223A1
公开(公告)日:2022-07-06
申请号:EP22157673.9
申请日:2020-03-14
Applicant: Intel Corporation
Inventor: Koker, Altug , Ray, Joydeep , Ould-Ahmed-Vall, ElMoustapha , Appu, Abhishek , Anantaraman, Aravindh , Andrei, Valentin , Bilagi, Durgaprasad , George, Varghese , Insko, Brent , Jahagirdar, Sanjeev , Janus, Scott , K, Pattabhiraman , Kim, SungYe , Maiyuran, Subramaniam , Ranganathan, Vasanth , Striramassarma, Lakshminarayanan , Tian, Xinmin
IPC: G06F12/123 , G06F12/126
Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics multiprocessor comprises a plurality of processing resources including a first set of processing cores and a second set of processing cores, wherein the first set of processing cores includes circuitry to execute instructions to perform matrix operations and the second set of processing cores includes circuitry to execute instructions to perform integer and floating-point operations; and a cache memory configured to be partitioned into multiple cache regions, wherein the multiple cache regions include a first cache region having a cache eviction policy with a configurable level of data persistence.
-
-
-
-
-
-
-