-
公开(公告)号:US20240168765A1
公开(公告)日:2024-05-23
申请号:US18086478
申请日:2022-12-21
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Stephen Anthony Bernard Jones , Alexander Lev Minkin , Olivier Giroux , Gokul Ramaswamy Hirisave Chandra Shekhara , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Alan Kaatz , Andrew Robert Kerr , Jack H. Choquette
CPC classification number: G06F9/3802 , G06F9/30047
Abstract: Apparatuses, systems, and techniques to perform a tensor prefetch instruction to cause one or more tensors to be stored into one or more caches. In at least one embodiment, one or more circuits of a GPU are to perform a tensor prefetch instruction to cause one or more tensors to be stored into one or more GPU caches.
-
公开(公告)号:US20240169471A1
公开(公告)日:2024-05-23
申请号:US18086476
申请日:2022-12-21
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Stephen Anthony Bernard Jones , Alexander Lev Minkin , Olivier Giroux , Gokul Ramaswamy Hirisave Chandra Shekhara , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Alan Kaatz , Andrew Robert Kerr , Jack H. Choquette
IPC: G06T1/60 , G06F12/0811 , G06F12/0862 , G06T1/20
CPC classification number: G06T1/60 , G06F12/0811 , G06F12/0862 , G06T1/20 , G06F2212/62
Abstract: Apparatuses, systems, and techniques to perform a graphics processing unit (GPU) prefetch instruction to cause a variable amount of information to be stored into one or more GPU caches. In at least one embodiment, one or more circuits of a GPU are to perform a GPU prefetch instruction to cause a variable amount of information to be stored into one or more GPU caches.
-
3.
公开(公告)号:US20240168659A1
公开(公告)日:2024-05-23
申请号:US18086429
申请日:2022-12-21
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Stephen Anthony Bernard Jones , Alexander Lev Minkin , Olivier Giroux , Gokul Ramaswamy Hirisave Chandra Shekhara , Aditya Avinash Atluri , Apoorv Parle , Chao Li , Ronny Meir Krashinsky , Alan Kaatz , Andrew Robert Kerr , Jack H. Choquette
IPC: G06F3/06 , G06F12/0862
CPC classification number: G06F3/0625 , G06F3/0646 , G06F3/0659 , G06F3/0673 , G06F12/0862 , G06F2212/608
Abstract: Apparatuses, systems, and techniques to transform and store information corresponding to one or more memory transactions. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause information corresponding to one or more memory transactions resulting from performance of the API to be transformed and stored.
-
公开(公告)号:US20230305845A1
公开(公告)日:2023-09-28
申请号:US17710699
申请日:2022-03-31
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Stephen Anthony Bernard Jones , David Anthony Fontaine , Sebastian Piotr Jodlowski , Aditya Avinash Atluri , Andrew Robert Kerr , Michael Andrew Clark , Gonzalo Brito Gadeschi , Olivier Giroux , Jaydeep Marathe , Thibaut Lutz , Hariharan Sandanagobalane , Gokul Ramaswamy Hirisave Chandra Shekhara , Girish Bhaskarrao Bharambe , Rishkul Kulkarni , Konstantinos Kyriakopoulos
CPC classification number: G06F9/3009 , G06F9/30043 , G06F9/544 , G06F9/5016
Abstract: Apparatuses, systems, and techniques to cause data to be selectively stored in one or more memory locations. In at least one embodiment, a processor is to cause data to be selectively stored in one or more memory locations based, at least in part, on one or more threads to use the data.
-
5.
公开(公告)号:US20240169469A1
公开(公告)日:2024-05-23
申请号:US18086433
申请日:2022-12-21
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Stephen Anthony Bernard Jones , Alexander Lev Minkin , Olivier Giroux , Gokul Ramaswamy Hirisave Chandra Shekhara , Aditya Avinash Atluri , Apoorv Parle , Chao Li , Ronny Meir Krashinsky , Alan Kaatz , Andrew Robert Kerr , Jack H. Choquette
Abstract: Apparatuses, systems, and techniques to transform information corresponding to one or more memory transactions. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause information corresponding to one or more memory transactions resulting from performance of the API to be transformed.
-
公开(公告)号:US20240161223A1
公开(公告)日:2024-05-16
申请号:US18086464
申请日:2022-12-21
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Stephen Anthony Bernard Jones , Alexander Lev Minkin , Olivier Giroux , Gokul Ramaswamy Hirisave Chandra Shekhara , Vishalkumar Ketankumar Mehta , Aditya Avinash Atluri , Apoorv Parle , Chao Li , Ronny Meir Krashinsky , Alan Kaatz , Andrew Robert Kerr , Jack H. Choquette
Abstract: Apparatuses, systems, and techniques to cause a first tensor to be translated into a second tensor according to a tensor map. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause a first tensor to be translated into a second tensor according to a tensor map.
-
公开(公告)号:US20220365750A1
公开(公告)日:2022-11-17
申请号:US17745512
申请日:2022-05-16
Applicant: NVIDIA Corporation
Inventor: Girish Bhaskarrao Bharambe , Kyrylo Perelygin , Advait Soman , Andrew Robert Kerr , Farhana Schuchman , Jaydeep Marathe , Stephen Anthony Bernard Jones , Ronny Meir Krashinsky , Jaewook Shin
Abstract: Apparatuses, systems, and techniques to generate numbers. In at least one embodiment, one or more circuits are to cause one or more thirty-two bit floating point numbers to be truncated to generate one or more rounded numbers based, at least in part, on one or more rounding attributes.
-
8.
公开(公告)号:US20240169470A1
公开(公告)日:2024-05-23
申请号:US18086442
申请日:2022-12-21
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Stephen Anthony Bernard Jones , Alexander Lev Minkin , Olivier Giroux , Gokul Ramaswamy Hirisave Chandra Shekhara , Vishalkumar Ketankumar Mehta , Aditya Avinash Atluri , Apoorv Parle , Chao Li , Ronny Meir Krashinsky , Alan Kaatz , Andrew Robert Kerr , Jack H. Choquette
Abstract: Apparatuses, systems, and techniques to store information in a plurality of storage locations allocated to a graphics processing unit (GPU). In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause information to be stored in a plurality of storage locations allocated to a first GPU.
-
公开(公告)号:US20240168830A1
公开(公告)日:2024-05-23
申请号:US18086461
申请日:2022-12-21
Applicant: NVIDIA Corporation
Inventor: Harold Carter Edwards , Stephen Anthony Bernard Jones , Alexander Lev Minkin , Olivier Giroux , Gokul Ramaswamy Hirisave Chandra Shekhara , Aditya Avinash Atluri , Apoorv Parle , Ronny Meir Krashinsky , Alan Kaatz , Andrew Robert Kerr , Jack H. Choquette
IPC: G06F9/54
CPC classification number: G06F9/544
Abstract: Apparatuses, systems, and techniques to indicate storage locations of information to be mapped from a first tensor to a second tensor. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to indicate one or more storage locations of information to be mapped from a first tensor to a second tensor.
-
公开(公告)号:US09684581B2
公开(公告)日:2017-06-20
申请号:US14283700
申请日:2014-05-21
Applicant: NVIDIA CORPORATION
Inventor: Andrew Robert Kerr , Matthew Grant Bolitho , Igor Sevastiyanov , Scott Ricketts , Michael Andersch
CPC classification number: G06F11/3428 , G06F8/433 , G06F9/455 , G06F11/34 , G06F11/3452 , G06F11/3466 , G06F11/3612
Abstract: One embodiment of the present invention includes a dependency extractor and a dependency investigator that, together, facilitate performance analysis of computer systems. In operation, the dependency extractor instruments a software application to generate run-time execution data for each work task. This execution data includes per-task performance data and dependency data reflecting linkages between tasks. After the instrumented software application finishes executing, the dependency investigator evaluates the captured execution data and identifies the critical path of tasks that establishes the overall run-time of the software application. Advantageously, since the execution data includes both task-level performance data and dependencies between tasks, the dependency investigator enables the developer to effectively optimize software and hardware in computer systems that are capable of concurrently executing tasks. By contrast, conventional performance analysis may not correctly identify critical paths in software applications that execute tasks in parallel across multiple processing units and, consequently, may misdirect optimization efforts.
-
-
-
-
-
-
-
-
-