-
公开(公告)号:US20240248718A1
公开(公告)日:2024-07-25
申请号:US18625903
申请日:2024-04-03
Applicant: NVIDIA Corporation
Inventor: Jeffrey Michael Pool , Andrew Kerr , John Tran , Ming Y. Siu , Stuart Oberman
CPC classification number: G06F9/30043 , G06F9/3001 , G06F9/30021 , G06F9/30094 , G06F9/30145 , G06F9/30098 , G06N20/00
Abstract: A method, computer readable medium, and processor are described herein for inline data inspection by using a decoder to decode a load instruction, including a signal to cause a circuit in a processor to indicate whether data loaded by a load instruction exceeds a threshold value. Moreover, an indication of whether data loaded by a load instruction exceeds a threshold value may be stored.
-
公开(公告)号:US11797302B2
公开(公告)日:2023-10-24
申请号:US17351161
申请日:2021-06-17
Applicant: NVIDIA Corporation
Inventor: Brent Ralph Boswell , Ming Y. Siu , Jack H. Choquette , Jonah M. Alben , Stuart Oberman
CPC classification number: G06F9/30014 , G06F9/3001 , G06F9/3012 , G06F9/30036 , G06F9/3851 , G06T1/20
Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.
-
3.
公开(公告)号:US11379420B2
公开(公告)日:2022-07-05
申请号:US16359787
申请日:2019-03-20
Applicant: NVIDIA Corporation
Inventor: Jorge Albericio Latorre , Jack H. Choquette , Manan Maheshkumar Patel , Jeffrey Pool , Ming Y. Siu , Ronny Meir Krashinsky , Ganesh Venkatesh
IPC: G06F16/174 , G06F16/901 , G06N3/08 , H03M7/30 , G06F16/14
Abstract: Compressed data is oftentimes beneficial for reducing the computing resources required, for example, to transmit and store data. The compression of data is particularly useful when dealing with sparse data (data that includes numerous zeros or near-zero values) and only non-zero values above a certain threshold have significance. When dealing with compressed data, oftentimes the data needs to be decompressed for processing (e.g., by deep learning networks or other applications configured to operate on sparse, or other uncompressed data). Instructions are disclosed for supporting the decompression of compressed data by a processing unit such as a CPU and GPU.
-
4.
公开(公告)号:US11150721B2
公开(公告)日:2021-10-19
申请号:US13671485
申请日:2012-11-07
Applicant: NVIDIA Corporation
Inventor: David Conrad Tannenbaum , Ming Y. Siu , Stuart F Oberman , Colin Sprinkle , Srinivasan Iyer , Ian Chi Yan Kwong
IPC: G06F1/3287 , G06F9/38 , G06F8/41
Abstract: A system and method are described for providing hints to a processing unit that subsequent operations are likely. Responsively, the processing unit takes steps to prepare for the likely subsequent operations. Where the hints are more likely than not to be correct, the processing unit operates more efficiently. For example, in an embodiment, the processing unit consumes less power. In another embodiment, subsequent operations are performed more quickly because the processing unit is prepared to efficiently handle the subsequent operations.
-
公开(公告)号:US20210311733A1
公开(公告)日:2021-10-07
申请号:US17351161
申请日:2021-06-17
Applicant: NVIDIA Corporation
Inventor: Brent Ralph Boswell , Ming Y. Siu , Jack H. Choquette , Jonah M. Alben , Stuart Oberman
Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.
-
公开(公告)号:US20210303302A1
公开(公告)日:2021-09-30
申请号:US17141082
申请日:2021-01-04
Applicant: NVIDIA Corporation
Inventor: Brent Ralph Boswell , Ming Y. Siu , Jack H. Choquette , Jonah M. Alben , Stuart Oberman
Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.
-
公开(公告)号:US20190324747A1
公开(公告)日:2019-10-24
申请号:US16459191
申请日:2019-07-01
Applicant: NVIDIA Corporation
Inventor: Brent Ralph Boswell , Ming Y. Siu , Jack H. Choquette , Jonah M. Alben , Stuart Oberman
Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.
-
公开(公告)号:US11816481B2
公开(公告)日:2023-11-14
申请号:US17890540
申请日:2022-08-18
Applicant: NVIDIA Corporation
Inventor: Brent Ralph Boswell , Ming Y. Siu , Jack H. Choquette , Jonah M. Alben , Stuart Oberman
CPC classification number: G06F9/30014 , G06F9/3001 , G06F9/3012 , G06F9/30036 , G06F9/3851 , G06T1/20
Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.
-
公开(公告)号:US11797301B2
公开(公告)日:2023-10-24
申请号:US17141082
申请日:2021-01-04
Applicant: NVIDIA Corporation
Inventor: Brent Ralph Boswell , Ming Y. Siu , Jack H. Choquette , Jonah M. Alben , Stuart Oberman
CPC classification number: G06F9/30014 , G06F9/3001 , G06F9/3012 , G06F9/30036 , G06F9/3851 , G06T1/20
Abstract: A method, computer readable medium, and processor are disclosed for performing matrix multiply and accumulate (MMA) operations. The processor includes a datapath configured to execute the MMA operation to generate a plurality of elements of a result matrix at an output of the datapath. Each element of the result matrix is generated by calculating at least one dot product of corresponding pairs of vectors associated with matrix operands specified in an instruction for the MMA operation. A dot product operation includes the steps of: generating a plurality of partial products by multiplying each element of a first vector with a corresponding element of a second vector; aligning the plurality of partial products based on the exponents associated with each element of the first vector and each element of the second vector; and accumulating the plurality of aligned partial products into a result queue utilizing at least one adder.
-
公开(公告)号:US20230221957A1
公开(公告)日:2023-07-13
申请号:US18112923
申请日:2023-02-22
Applicant: NVIDIA Corporation
Inventor: Jeffrey Michael Pool , Andrew Kerr , John Tran , Ming Y. Siu , Stuart Oberman
IPC: G06F9/30
CPC classification number: G06F9/30043 , G06F9/30021 , G06F9/30145
Abstract: A method, computer readable medium, and processor are described herein for inline data inspection by using a decoder to decode a load instruction, including a signal to cause a circuit in a processor to indicate whether data loaded by a load instruction exceeds a threshold value. Moreover, an indication of whether data loaded by a load instruction exceeds a threshold value may be stored.
-
-
-
-
-
-
-
-
-