-
21.
公开(公告)号:US20240036954A1
公开(公告)日:2024-02-01
申请号:US17955106
申请日:2022-09-28
Applicant: NVIDIA Corporation
Inventor: Ze Long , Kyrylo Perelygin , Harold Carter Edwards , Gokul Ramaswamy Hirisave Chandra Shekhara , Jaydeep Marathe , Ronny Meir Krashinsky , Girish Bhaskarrao Bharambe
CPC classification number: G06F9/544 , G06F9/4881
Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate one or more attributes of one or more groups of blocks of one or more threads.
-
公开(公告)号:US20240036917A1
公开(公告)日:2024-02-01
申请号:US17955110
申请日:2022-09-28
Applicant: NVIDIA Corporation
Inventor: Ze Long , Kyrylo Perelygin , Harold Carter Edwards , Gokul Ramaswamy Hirisave Chandra Shekhara , Jaydeep Marathe , Ronny Meir Krashinsky , Girish Bhaskarrao Bharambe
CPC classification number: G06F9/4881 , G06F9/5044 , G06F9/545
Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate a maximum number of blocks of threads to be scheduled in parallel.
-
23.
公开(公告)号:US20200285618A1
公开(公告)日:2020-09-10
申请号:US16359787
申请日:2019-03-20
Applicant: NVIDIA Corporation
Inventor: Jorge Albericio Latorre , Jack H. Choquette , Manan Maheshkumar Patel , Jeffrey Pool , Ming Y. Siu , Ronny Meir Krashinsky , Ganesh Venkatesh
IPC: G06F16/174 , G06F16/901 , G06F16/14 , H03M7/30 , G06N3/08
Abstract: Compressed data is oftentimes beneficial for reducing the computing resources required, for example, to transmit and store data. The compression of data is particularly useful when dealing with sparse data (data that includes numerous zeros or near-zero values) and only non-zero values above a certain threshold have significance. When dealing with compressed data, oftentimes the data needs to be decompressed for processing (e.g., by deep learning networks or other applications configured to operate on sparse, or other uncompressed data). Instructions are disclosed for supporting the decompression of compressed data by a processing unit such as a CPU and GPU.
-
公开(公告)号:US09971699B2
公开(公告)日:2018-05-15
申请号:US15146834
申请日:2016-05-04
Applicant: NVIDIA Corporation
Inventor: Ronny Meir Krashinsky , Xiaogang Qiu
IPC: G06F12/08 , G06F12/0891 , G06F12/0862 , G06F12/0897 , G06F9/30 , G06F9/38
CPC classification number: G06F12/0891 , G06F9/30043 , G06F9/3838 , G06F12/0862 , G06F12/0897 , G06F12/126 , G06F2212/602
Abstract: A method, computer readable medium, and system are disclosed for decoupling data pre-fetch from demand loads. The method includes the steps of receiving, by a processor, a set of instructions that includes a load instruction; and executing, by the processor, the load instruction to perform a load operation. The load operation loads data from a cache unit into a register file. The load instruction includes a no-update operator that prevents the cache unit from updating the cache state information in response to the load operation. The result is that the eviction policy for the cache unit responds to the order of pre-fetch memory access requests rather than the demand load operations.
-
公开(公告)号:US20170322887A1
公开(公告)日:2017-11-09
申请号:US15146834
申请日:2016-05-04
Applicant: NVIDIA Corporation
Inventor: Ronny Meir Krashinsky , Xiaogang Qiu
IPC: G06F12/0891 , G06F9/38 , G06F9/30 , G06F12/0862 , G06F12/0897
CPC classification number: G06F12/0891 , G06F9/30043 , G06F9/3838 , G06F12/0862 , G06F12/0897 , G06F12/126 , G06F2212/602
Abstract: A method, computer readable medium, and system are disclosed for decoupling data pre-fetch from demand loads. The method includes the steps of receiving, by a processor, a set of instructions that includes a load instruction; and executing, by the processor, the load instruction to perform a load operation. The load operation loads data from a cache unit into a register file. The load instruction includes a no-update operator that prevents the cache unit from updating the cache state information in response to the load operation. The result is that the eviction policy for the cache unit responds to the order of pre-fetch memory access requests rather than the demand load operations.
-
公开(公告)号:US20240169472A1
公开(公告)日:2024-05-23
申请号:US18086484
申请日:2022-12-21
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Stephen Anthony Bernard Jones , Alexander Lev Minkin , Olivier Giroux , Gokul Ramaswamy Hirisave Chandra Shekhara , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Alan Kaatz , Andrew Robert Kerr , Jack H. Choquette
IPC: G06T1/60 , G06F12/0811 , G06F12/0862 , G06T1/20
CPC classification number: G06T1/60 , G06F12/0811 , G06F12/0862 , G06T1/20 , G06F2212/62
Abstract: Apparatuses, systems, and techniques to perform a tensor prefetch instruction to cause one or more tensors to be transformed and stored into one or more caches. In at least one embodiment, one or more circuits of a GPU are to perform a tensor prefetch instruction to cause one or more tensors to be transformed and stored into one or more GPU caches.
-
公开(公告)号:US20240169467A1
公开(公告)日:2024-05-23
申请号:US18081559
申请日:2022-12-14
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Olivier Giroux , Jack H. Choquette , Gokul Ramaswamy Hirisave Chandra Shekhara , Rui Guo , Chao Li , Vishalkumar Ketankumar Mehta , David Dastous St. Hilaire , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Subhasmita Chakraborty , Vikram Dhar
Abstract: Apparatuses, systems, and techniques to create one or more memory transaction software objects. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause one or more software objects to indicate whether one or more memory transactions have been performed.
-
公开(公告)号:US20240168831A1
公开(公告)日:2024-05-23
申请号:US18086473
申请日:2022-12-21
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Stephen Anthony Bernard Jones , Alexander Lev Minkin , Olivier Giroux , Gokul Ramaswamy Hirisave Chandra Shekhara , Aditya Avinash Atluri , Apoorv Parle , Chao Li , Ronny Meir Krashinsky , Alan Kaatz , Andrew Robert Kerr , Jack H. Choquette
IPC: G06F9/54
CPC classification number: G06F9/544
Abstract: Apparatuses, systems, and techniques to cause a first tensor to be translated into a second tensor according to a tensor map. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause a first tensor to be translated into a second tensor according to a tensor map.
-
公开(公告)号:US20240168829A1
公开(公告)日:2024-05-23
申请号:US18086451
申请日:2022-12-21
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Stephen Anthony Bernard Jones , Alexander Lev Minkin , Olivier Giroux , Gokul Ramaswamy Hirisave Chandra Shekhara , Vishalkumar Ketankumar Mehta , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Alan Kaatz , Andrew Robert Kerr , Jack H. Choquette
IPC: G06F9/54
CPC classification number: G06F9/544
Abstract: Apparatuses, systems, and techniques to generate a tensor mapping. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause a mapping from a first tensor to a second tensor to be generated.
-
公开(公告)号:US20240161224A1
公开(公告)日:2024-05-16
申请号:US18086469
申请日:2022-12-21
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Stephen Anthony Bernard Jones , Alexander Lev Minkin , Olivier Giroux , Gokul Ramaswamy Hirisave Chandra Shekhara , Vishalkumar Ketankumar Mehta , Aditya Avinash Atluri , Apoorv Parle , Chao Li , Ronny Meir Krashinsky , Alan Kaatz , Andrew Robert Kerr , Jack H. Choquette
Abstract: Apparatuses, systems, and techniques to cause a first tensor to be translated into a second tensor according to a tensor map without storing information about a memory transaction corresponding to the translation. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause a first tensor to be translated into a second tensor according to a tensor map without storing information about one or more memory transactions corresponding to the translation.
-
-
-
-
-
-
-
-
-