Patent search ap:("NVIDIA Corporation") AND inv:"Larry Robert Dennison" Page 1

1.

发明授权
Sparse convolutional neural network accelerator 有权

公开(公告)号：US11847550B2

公开(公告)日：2023-12-19

申请号：US17111875

申请日：2020-12-04

Applicant: NVIDIA Corporation

Inventor： William J. Dally , Angshuman Parashar , Joel Springer Emer , Stephen William Keckler , Larry Robert Dennison

IPC: G06N3/04 , G06N3/042 , G06F17/11 , G06F9/30 , G06F9/38 , G06N3/082 , G06N3/063 , G06N3/045 , G06N3/048 , G06F7/544 , G06F9/355 , G06F17/16 , G06F9/28

CPC classification number: G06N3/042 , G06F7/5443 , G06F9/3001 , G06F9/30018 , G06F9/30025 , G06F9/30036 , G06F9/3851 , G06F9/3887 , G06F17/11 , G06N3/045 , G06N3/048 , G06N3/063 , G06N3/082 , G06F9/28 , G06F9/3555 , G06F17/16 , G06F2207/4824

Abstract: A method, computer program product, and system perform computations using a processor. A first instruction including a first index vector operand and a second index vector operand is received and the first index vector operand is decoded to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array. The second index vector operand is decoded to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array. The first coordinate sets are summed with the second coordinate sets to produce output coordinate sets and the output coordinate sets are converted into a set of linear indices.

2.

发明申请
DISTRIBUTED ADDRESS TRANSLATION IN A MULTI-NODE INTERCONNECT FABRIC 审中-公开

公开(公告)号：US20200159669A1

公开(公告)日：2020-05-21

申请号：US16198649

申请日：2018-11-21

Applicant: NVIDIA Corporation

Inventor： Samuel Hammond Duncan , Sanjeev Jain , Mark Douglas Hummel , Vyas Venkataraman , Olivier Giroux , Larry Robert Dennison , Alexander Toichi Ishii , Hemayet Hossain , Nir Haim Arad

IPC: G06F12/1027

Abstract: Multiprocessor clusters in a virtualized environment conventionally fail to provide memory access security, which is frequently a requirement for efficient utilization in multi-client settings. Without adequate access security, a malicious process may access what might be confidential data that belongs to a different client sharing the multiprocessor cluster. Furthermore, an inadvertent programming error in the code for one client process may accidentally corrupt data that belongs to the different client. Neither scenario is acceptable. Embodiments of the present disclosure provide access security by enabling each processing node within a multiprocessor cluster to virtualize and manage local memory access and only process access requests possessing proper access credentials. In this way, different applications executing on a multiprocessor cluster may be isolated from each other while advantageously sharing the hardware resources of the multiprocessor cluster.

3.

发明申请
SPARSE CONVOLUTIONAL NEURAL NETWORK ACCELERATOR 审中-公开

公开(公告)号：US20180046916A1

公开(公告)日：2018-02-15

申请号：US15458837

申请日：2017-03-14

Applicant: NVIDIA Corporation

Inventor： William J. Dally , Angshuman Parashar , Joel Springer Emer , Stephen William Keckler , Larry Robert Dennison

IPC: G06N3/08 , G06F7/523 , G06N3/04

CPC classification number: G06N3/063 , G06F7/523 , G06F7/5443 , G06F2207/4824 , G06N3/04 , G06N3/0454 , G06N3/082 , G06N3/084

Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. Compressed-sparse data is received for input to a processing element, wherein the compressed-sparse data encodes non-zero elements and corresponding multi-dimensional positions. The non-zero elements are processed in parallel by the processing element to produce a plurality of result values. The corresponding multi-dimensional positions are processed in parallel by the processing element to produce destination addresses for each result value in the plurality of result values. Each result value is transmitted to a destination accumulator associated with the destination address for the result value.

4.

发明公开
IN-NETWORK MESSAGE AGGREGATION FOR EFFICIENT SMALL MESSAGE TRANSPORT 审中-公开

公开(公告)号：US20230327996A1

公开(公告)日：2023-10-12

申请号：US18149924

申请日：2023-01-04

Applicant: NVIDIA Corporation

Inventor： Benjamin Klenk , Alan Lynn Davis , Larry Robert Dennison

IPC: H04L47/2441 , H04L43/026

CPC classification number: H04L47/2441 , H04L43/026

Abstract: Aggregation of small payloads from multiple packets may improve bandwidth efficiency of a network, particularly a high-performance compute cluster with thousands of network endpoints and distributed data. Aggregation is context-based and a packet header is reduced because the common components that are shared by the aggregated messages are included once within the header. Execution contexts are explicitly created and destroyed by application programs. Each participating endpoint stores context-specific properties until the context is destroyed, so that the properties are not included in the header. Aggregation may be performed at different hierarchical levels by switches and/or endpoints.

5.

发明授权
Distributed batch normalization using partial populations 有权

公开(公告)号：US11341369B2

公开(公告)日：2022-05-24

申请号：US16669925

申请日：2019-10-31

Applicant: NVIDIA Corporation

Inventor： Larry Robert Dennison , Benjamin Klenk

IPC: G06K9/62 , G06N3/08 , G06F7/483 , G06F9/38 , G06N3/04

Abstract: A technique for performing data parallel training of a neural network model is disclosed that incorporates batch normalization techniques using partial populations to generate normalization parameters. The technique involves processing, by each processor of a plurality of processors in parallel, a first portion of a sub-batch of training samples allocated to the processor to generate activations for the first portion of the sub-batch. Each processor analyzes the activations and transmits statistical measures for the first portion to an additional processor that reduces the statistical measures from multiple processors to generate normalization parameters for a partial population of the training samples that includes the first portion from each of the plurality of processors. The normalization parameters are then transmitted back to each of the processors to normalize the activations for both the first portion and a second portion of the sub-batch of training samples allocated to each processor.

6.

发明申请
SCALABLE IN-NETWORK COMPUTATION FOR MASSIVELY-PARALLEL SHARED-MEMORY PROCESSORS 有权

公开(公告)号：US20220029845A1

公开(公告)日：2022-01-27

申请号：US17495547

申请日：2021-10-06

Applicant: NVIDIA Corporation

Inventor： Benjamin Klenk , Nan Jiang , Larry Robert Dennison , Gregory M. Thorson

IPC: H04L12/18 , G06F9/50 , H04L12/801 , H04L12/813 , H04L12/927 , H04L12/741 , H04L29/08

Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.

7.

发明授权
Distributed batch normalization using estimates and rollback 有权

公开(公告)号：US11170263B2

公开(公告)日：2021-11-09

申请号：US16669979

申请日：2019-10-31

Applicant: NVIDIA Corporation

Inventor： Larry Robert Dennison , Benjamin Klenk

IPC: G06K9/62 , G06N3/08 , G06F7/483 , G06F9/38 , G06N3/04

Abstract: A technique utilizing speculative execution and rollback for performing data parallel training of a neural network model is disclosed. Activations for a layer of the neural network model are normalized during a speculative normalization operation using estimated normalization parameters associated with a partial population of a set of training data allocated to a particular processor. Normalization parameters associated with the total population of the set of training data are generated by a distributed reduce operation in parallel with the speculative normalization operation. An optional rollback operation can revert the activations to a pre-normalization state if the estimated normalization parameters for the partial population are subsequently determined to be inaccurate compared to the normalization parameters for the population of the set of training data distributed across a plurality of processors.

8.

发明申请
INJECTION LIMITING AND WAVE SYNCHRONIZATION FOR SCALABLE IN-NETWORK COMPUTATION 有权

公开(公告)号：US20210036881A1

公开(公告)日：2021-02-04

申请号：US16938044

申请日：2020-07-24

Applicant: NVIDIA Corporation

Inventor： Benjamin Klenk , Nan Jiang , Larry Robert Dennison

IPC: H04L12/18 , G06F9/50 , H04L12/927 , H04L12/813 , H04L12/801

Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. An injection policy comprising the issuing of credits enables each endpoint to limit the amount of collective communication primitives injected into the network simultaneously to reduce network congestion caused by increased network traffic due to the multicast capability of the network devices.

9.

发明授权
Sparse convolutional neural network accelerator 有权

公开(公告)号：US10860922B2

公开(公告)日：2020-12-08

申请号：US16686931

申请日：2019-11-18

Applicant: NVIDIA Corporation

Inventor： William J. Dally , Angshuman Parashar , Joel Springer Emer , Stephen William Keckler , Larry Robert Dennison

IPC: G06N3/063 , G06F7/544 , G06N3/04 , G06F7/523 , G06N3/08

Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. A first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a 3D space is received. A second vector comprising only non-zero input activation values and second associated positions of the non-zero input activation values within a 2D space is received. The non-zero weight values are multiplied with the non-zero input activation values, within a multiplier array, to produce a third vector of products. The first associated positions are combined with the second associated positions to produce a fourth vector of positions, where each position in the fourth vector is associated with a respective product in the third vector. The products in the third vector are transmitted to adders in an accumulator array, based on the position associated with each one of the products.

10.

发明授权
Distributed address translation in a multi-node interconnect fabric 有权

公开(公告)号：US10769076B2

公开(公告)日：2020-09-08

申请号：US16198649

申请日：2018-11-21

Applicant: NVIDIA Corporation

Inventor： Samuel Hammond Duncan , Sanjeev Jain , Mark Douglas Hummel , Vyas Venkataraman , Olivier Giroux , Larry Robert Dennison , Alexander Toichi Ishii , Hemayet Hossain , Nir Haim Arad

IPC: G06F9/455 , G06F12/1027

Abstract: Multiprocessor clusters in a virtualized environment conventionally fail to provide memory access security, which is frequently a requirement for efficient utilization in multi-client settings. Without adequate access security, a malicious process may access what might be confidential data that belongs to a different client sharing the multiprocessor cluster. Furthermore, an inadvertent programming error in the code for one client process may accidentally corrupt data that belongs to the different client. Neither scenario is acceptable. Embodiments of the present disclosure provide access security by enabling each processing node within a multiprocessor cluster to virtualize and manage local memory access and only process access requests possessing proper access credentials. In this way, different applications executing on a multiprocessor cluster may be isolated from each other while advantageously sharing the hardware resources of the multiprocessor cluster.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification