-
公开(公告)号:US20250036975A1
公开(公告)日:2025-01-30
申请号:US18414057
申请日:2024-01-16
Applicant: NVIDIA Corporation
Inventor: Xiaowei Ren , Nitin Nitin , Michael Andersch
IPC: G06N5/04
Abstract: Apparatuses, systems, and techniques to perform neural networks. In at least one embodiment, a processor is to cause information to be distributed to processing cores. In at least one embodiment, a processor is to cause inferencing of two or more contiguous portions of information to be distributed between two or more respective processing cores based, at least in part, on locations of the two or more contiguous portions within the information relative to one or more terminating portions of the information.
-
公开(公告)号:US20240152725A1
公开(公告)日:2024-05-09
申请号:US17982386
申请日:2022-11-07
Applicant: NVIDIA Corporation
Inventor: Gil Shomron , Rachit Garg , Sukru Burc Eryilmaz , Michael Andersch
IPC: G06N3/04
CPC classification number: G06N3/04
Abstract: Apparatuses, systems, and techniques to perform matrix computations associated with computing output of a neural network. In at least one embodiment, one or more circuits cause one or more feature maps of one or more neural networks to be spatially concatenated.
-
公开(公告)号:US20250036954A1
公开(公告)日:2025-01-30
申请号:US18226143
申请日:2023-07-25
Applicant: NVIDIA Corporation
Inventor: Xiaowei Ren , Kevin Chong Man Siu , Nitin Nitin , Michael Andersch
Abstract: Apparatuses, systems, and techniques to perform neural networks. In at least one embodiment, a processor is to cause information to be distributed to processing cores. In at least one embodiment, a processor is to cause inferencing of two or more contiguous portions of information to be distributed only between two or more respective processing cores.
-
公开(公告)号:US09684581B2
公开(公告)日:2017-06-20
申请号:US14283700
申请日:2014-05-21
Applicant: NVIDIA CORPORATION
Inventor: Andrew Robert Kerr , Matthew Grant Bolitho , Igor Sevastiyanov , Scott Ricketts , Michael Andersch
CPC classification number: G06F11/3428 , G06F8/433 , G06F9/455 , G06F11/34 , G06F11/3452 , G06F11/3466 , G06F11/3612
Abstract: One embodiment of the present invention includes a dependency extractor and a dependency investigator that, together, facilitate performance analysis of computer systems. In operation, the dependency extractor instruments a software application to generate run-time execution data for each work task. This execution data includes per-task performance data and dependency data reflecting linkages between tasks. After the instrumented software application finishes executing, the dependency investigator evaluates the captured execution data and identifies the critical path of tasks that establishes the overall run-time of the software application. Advantageously, since the execution data includes both task-level performance data and dependencies between tasks, the dependency investigator enables the developer to effectively optimize software and hardware in computer systems that are capable of concurrently executing tasks. By contrast, conventional performance analysis may not correctly identify critical paths in software applications that execute tasks in parallel across multiple processing units and, consequently, may misdirect optimization efforts.
-
-
-