Patent search ap:("NVIDIA Corporation") AND inv:"William James Dally" Page 3

21.

发明授权
Processor for performing dynamic programming according to an instruction, and a method for configuring a processor for dynamic programming via an instruction 有权

公开(公告)号：US11726757B2

公开(公告)日：2023-08-15

申请号：US16811068

申请日：2020-03-06

Applicant: Nvidia Corporation

Inventor： William James Dally

IPC: G06F8/41 , G06F9/50 , G06T1/20

CPC classification number: G06F8/451 , G06F9/5066 , G06T1/20

Abstract: The disclosure provides processors that are configured to perform dynamic programming according to an instruction, a method for configuring a processor for dynamic programming according to an instruction and a method of computing a modified Smith Waterman algorithm employing an instruction for configuring a parallel processing unit. In one example, the method for configuring includes: (1) receiving, by execution cores of the processor, an instruction that directs the execution cores to compute a set of recurrence equations employing a matrix, (2) configuring the execution cores, according to the set of recurrence equations, to compute states for elements of the matrix, and (3) storing the computed states for current elements of the matrix in registers of the execution cores, wherein the computed states are determined based on the set of recurrence equations and input data.

22.

发明公开
OPTIMALLY CLIPPED TENSORS AND VECTORS 审中-公开

公开(公告)号：US20230237308A1

公开(公告)日：2023-07-27

申请号：US17814957

申请日：2022-07-26

Applicant: NVIDIA Corporation

Inventor： Charbel Sakr , Steve Haihang Dai , Brucek Kurdo Khailany , William James Dally , Rangharajan Venkatesan , Brian Matthew Zimmer

IPC: G06N3/04 , G06N3/08

CPC classification number: G06N3/04 , G06N3/08

Abstract: Quantizing tensors and vectors processed within a neural network reduces power consumption and may accelerate processing. Quantization reduces the number of bits used to represent a value, where decreasing the number of bits used can decrease the accuracy of computations that use the value. Ideally, quantization is performed without reducing accuracy. Quantization-aware training (QAT) is performed by dynamically quantizing tensors (weights and activations) using optimal clipping scalars. “Optimal” in that the mean squared error (MSE) of the quantized operation is minimized and the clipping scalars define the degree or amount of quantization for various tensors of the operation. Conventional techniques that quantize tensors during training suffer from high amounts of noise (error). Other techniques compute the clipping scalars offline through a brute force search to provide high accuracy. In contrast, the optimal clipping scalars can be computed online and provide the same accuracy as the clipping scalars computed offline.

23.

发明公开
MAPPING LOGICAL AND PHYSICAL PROCESSORS AND LOGICAL AND PHYSICAL MEMORY 审中-公开

公开(公告)号：US20230237011A1

公开(公告)日：2023-07-27

申请号：US17581734

申请日：2022-01-21

Applicant: NVIDIA Corporation

Inventor： William James Dally

IPC: G06F15/80 , G06F12/06

CPC classification number: G06F15/80 , G06F12/0646 , G06F2212/7201

Abstract: A mapping may be made between an array of physical processors and an array of functional logical processors. Also, a mapping may be made between logical memory channels (associated with the logical processors) and functional physical memory channels (associated with the physical processors). These mappings may be stored within one or more tables, which may then be used to bypass faulty processors and memory channels when implementing memory accesses, while optimizing locality (e.g., by minimizing the proximity of memory channels to processors).

24.

发明申请
DRAM WITH SEGMENTED PAGE CONFIGURATION 有权

公开(公告)号：US20170154667A1

公开(公告)日：2017-06-01

申请号：US15430393

申请日：2017-02-10

Applicant: NVIDIA Corporation

Inventor： William James Dally

IPC: G11C11/4091 , G11C11/4099 , G11C11/408

CPC classification number: G11C11/4091 , G11C11/4063 , G11C11/4085 , G11C11/4087 , G11C11/4099

Abstract: This description is directed to a dynamic random access memory (DRAM) array having a plurality of rows and a plurality of columns. The array further includes a plurality of cells, each of which are associated with one of the columns and one of the rows. Each cell includes a capacitor that is selectively coupled to a bit line of its associate column so as to share charge with the bit line when the cell is selected. There is a segmented word line circuit for each row, which is controllable to cause selection of only a portion of the cells in the row.

25.

发明授权
SRAM voltage assist 有权
Title translation: SRAM电压辅助

公开(公告)号：US09460776B2

公开(公告)日：2016-10-04

申请号：US13748499

申请日：2013-01-23

Applicant: NVIDIA Corporation

Inventor： William James Dally

IPC: G11C11/24 , G11C11/412 , G11C11/419 , G11C11/404

CPC classification number: G11C11/4125 , G11C11/404 , G11C11/419

Abstract: The disclosure provides for an SRAM array having a plurality of wordlines and a plurality of bitlines, referred to generally as SRAM lines. The array has a plurality of cells, each cell being defined by an intersection between one of the wordlines and one of the bitlines. The SRAM array further includes voltage boost circuitry operatively coupled with the cells, the voltage boost circuitry being configured to provide an amount of voltage boost that is based on an address of a cell to be accessed and/or to provide this voltage boost on an SRAM line via capacitive charge coupling.

Abstract translation: 本公开提供了具有多个字线和多个位线的SRAM阵列，通常称为SRAM线。该阵列具有多个单元，每个单元由字线之一和位线之一的交点定义。所述SRAM阵列还包括与所述单元操作地耦合的升压电路，所述升压电路被配置为提供基于待访问的单元的地址和/或在SRAM上提供该电压升压的一定量的升压电压线通过电容电荷耦合。

26.

发明申请
CURRENT PARKING RESPONSE TO TRANSIENT LOAD DEMANDS 有权
Title translation: 当前停车对瞬态负载的影响

公开(公告)号：US20140097813A1

公开(公告)日：2014-04-10

申请号：US13647202

申请日：2012-10-08

Applicant: NVIDIA CORPORATION

Inventor： William James Dally

IPC: G05F1/625

CPC classification number: H02M3/158 , H02M1/15 , H02M3/1582 , H02M3/1588 , H02M2003/1566 , Y02B70/1425

Abstract: Embodiments are disclosed relating to an electric power conversion device and methods for controlling the operation thereof. One disclosed embodiment provides an electric power conversion device comprising a first current control mechanism coupled to an electric power source and an upstream end of an inductor, where the first current control mechanism is operable to control inductor current. The electric power conversion device further comprises a second current control mechanism coupled between the downstream end of the inductor and a load, where the second current control mechanism is operable to control how much of the inductor current is delivered to the load.

Abstract translation: 公开了关于电力转换装置的实施例以及用于控制其操作的方法。一个公开的实施例提供一种电力转换装置，其包括耦合到电源和电感器的上游端的第一电流控制机构，其中第一电流控制机构可操作以控制电感器电流。电力转换装置还包括耦合在电感器的下游端和负载之间的第二电流控制机构，其中第二电流控制机构可操作以控制电感器电流传送到负载的电流。

27.

发明授权
Inference accelerator using logarithmic-based arithmetic 有权

公开(公告)号：US12141225B2

公开(公告)日：2024-11-12

申请号：US16750823

申请日：2020-01-23

Applicant: NVIDIA Corporation

Inventor： William James Dally , Rangharajan Venkatesan , Brucek Kurdo Khailany

IPC: G06F17/15 , G06F7/483 , G06N20/00

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.

28.

发明授权
Application partitioning for locality in a stacked memory system 有权

公开(公告)号：US12099453B2

公开(公告)日：2024-09-24

申请号：US17709031

申请日：2022-03-30

Applicant: NVIDIA Corporation

Inventor： William James Dally , Carl Thomas Gray , Stephen W. Keckler , James Michael O'Connor

IPC: G06F13/16 , G11C8/12 , H03K19/1776

CPC classification number: G06F13/161 , G06F13/1673 , G06F13/1689 , G11C8/12 , H03K19/1776

Abstract: Embodiments of the present disclosure relate to application partitioning for locality in a stacked memory system. In an embodiment, one or more memory dies are stacked on the processor die. The processor die includes multiple processing tiles and each memory die includes multiple memory tiles. Vertically aligned memory tiles are directly coupled to and comprise the local memory block for a corresponding processing tile. An application program that operates on dense multi-dimensional arrays (matrices) may partition the dense arrays into sub-arrays associated with program tiles. Each program tile is executed by a processing tile using the processing tile's local memory block to process the associated sub-array. Data associated with each sub-array is stored in a local memory block and the processing tile corresponding to the local memory block executes the program tile to process the sub-array data.

29.

发明授权
Asynchronous accumulator using logarithmic-based arithmetic 有权

公开(公告)号：US12033060B2

公开(公告)日：2024-07-09

申请号：US16750917

申请日：2020-01-23

Applicant: NVIDIA Corporation

Inventor： William James Dally , Rangharajan Venkatesan , Brucek Kurdo Khailany , Stephen G. Tell

IPC: G06N3/063 , G06F7/544 , G06F7/575 , G06F17/16 , G06N3/02 , G06N3/045

CPC classification number: G06N3/063 , G06F7/575 , G06F17/16 , G06N3/02 , G06F7/5443 , G06N3/045

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.

30.

发明公开
NEURAL NETWORK ACCELERATOR USING LOGARITHMIC-BASED ARITHMETIC 审中-公开

公开(公告)号：US20240112007A1

公开(公告)日：2024-04-04

申请号：US18537570

申请日：2023-12-12

Applicant: NVIDIA Corporation

Inventor： William James Dally , Rangharajan Venkatesan , Brucek Kurdo Khailany

IPC: G06N3/063 , G06F7/483 , G06F17/16

CPC classification number: G06N3/063 , G06F7/4833 , G06F17/16

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification