Patent search ap:("NVIDIA Corporation") AND inv:"William James Dally" Page 1

1.

发明授权
Neural network accelerator using logarithmic-based arithmetic 有权

公开(公告)号：US11886980B2

公开(公告)日：2024-01-30

申请号：US16549683

申请日：2019-08-23

Applicant: NVIDIA Corporation

Inventor： William James Dally , Rangharajan Venkatesan , Brucek Kurdo Khailany

IPC: G06N3/06 , G06N3/063 , G06F17/16 , G06F7/483

CPC classification number: G06N3/063 , G06F7/4833 , G06F17/16

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.

2.

发明申请
PREVENTING GLITCH PROPAGATION 有权

公开(公告)号：US20220004864A1

公开(公告)日：2022-01-06

申请号：US16919375

申请日：2020-07-02

Applicant: NVIDIA Corporation

Inventor： William James Dally

IPC: G06N3/08 , H03K3/037 , H03K5/01 , H03K19/003 , G06N3/063 , G06N3/04 , G06F1/12

Abstract: When a signal glitches, logic receiving the signal may change in response, thereby charging and/or discharging nodes within the logic and dissipating power. Providing a glitch-free signal may reduce the number of times the nodes are charged and/or discharged, thereby reducing the power dissipation. A technique for eliminating glitches in a signal is to insert a storage element that samples the signal after it is done changing to produce a glitch-free output signal. The storage element is enabled by a “ready” signal having a delay that matches the delay of circuitry generating the signal. The technique prevents the output signal from changing until the final value of the signal is achieved. The output signal changes only once, typically reducing the number of times nodes in the logic receiving the signal are charged and/or discharged so that power dissipation is also reduced.

3.

发明申请
INFERENCE ACCELERATOR USING LOGARITHMIC-BASED ARITHMETIC 有权

公开(公告)号：US20210056446A1

公开(公告)日：2021-02-25

申请号：US16750823

申请日：2020-01-23

Applicant: NVIDIA Corporation

Inventor： William James Dally , Rangharajan Venkatesan , Brucek Kurdo Khailany

IPC: G06N5/04

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.

4.

发明授权
Hierarchical network for stacked memory system 有权

公开(公告)号：US12223201B2

公开(公告)日：2025-02-11

申请号：US18438139

申请日：2024-02-09

Applicant: NVIDIA Corporation

Inventor： William James Dally , Carl Thomas Gray , Stephen W. Keckler , James Michael O'Connor

IPC: G06F3/06

Abstract: A hierarchical network enables access for a stacked memory system including or more memory dies that each include multiple memory tiles. The processor die includes multiple processing tiles that are stacked with the one or more memory die. The memory tiles that are vertically aligned with a processing tile are directly coupled to the processing tile and comprise the local memory block for the processing tile. The hierarchical network provides access paths for each processing tile to access the processing tile's local memory block, the local memory block coupled to a different processing tile within the same processing die, memory tiles in a different die stack, and memory tiles in a different device. The ratio of memory bandwidth (byte) to floating-point operation (B:F) may improve 50× for accessing the local memory block compared with conventional memory. Additionally, the energy consumed to transfer each bit may be reduced by 10×.

5.

发明授权
Neural network accelerator using logarithmic-based arithmetic 有权

公开(公告)号：US12118454B2

公开(公告)日：2024-10-15

申请号：US18537570

申请日：2023-12-12

Applicant: NVIDIA Corporation

Inventor： William James Dally , Rangharajan Venkatesan , Brucek Kurdo Khailany

IPC: G06N3/063 , G06F7/483 , G06F17/16

CPC classification number: G06N3/063 , G06F7/4833 , G06F17/16

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.

6.

发明公开
HIERARCHICAL NETWORK FOR STACKED MEMORY SYSTEM 审中-公开

公开(公告)号：US20240211166A1

公开(公告)日：2024-06-27

申请号：US18438139

申请日：2024-02-09

Applicant: NVIDIA Corporation

Inventor： William James Dally , Carl Thomas Gray , Stephen W. Keckler , James Michael O'Connor

IPC: G06F3/06

CPC classification number: G06F3/0655 , G06F3/0604 , G06F3/0679

Abstract: A hierarchical network enables access for a stacked memory system including or more memory dies that each include multiple memory tiles. The processor die includes multiple processing tiles that are stacked with the one or more memory die. The memory tiles that are vertically aligned with a processing tile are directly coupled to the processing tile and comprise the local memory block for the processing tile. The hierarchical network provides access paths for each processing tile to access the processing tile's local memory block, the local memory block coupled to a different processing tile within the same processing die, memory tiles in a different die stack, and memory tiles in a different device. The ratio of memory bandwidth (byte) to floating-point operation (B:F) may improve 50× for accessing the local memory block compared with conventional memory. Additionally, the energy consumed to transfer each bit may be reduced by 10×.

7.

发明授权
Preventing glitch propagation 有权

公开(公告)号：US11809989B2

公开(公告)日：2023-11-07

申请号：US16919375

申请日：2020-07-02

Applicant: NVIDIA Corporation

Inventor： William James Dally

IPC: G06N3/045 , G06N3/084 , G06N3/08 , H03K3/037 , H03K5/01 , G06N3/063 , G06N3/04 , G06F1/12 , H03K19/003 , H03K5/00

CPC classification number: G06N3/08 , G06F1/12 , G06N3/04 , G06N3/063 , H03K3/037 , H03K5/01 , H03K19/003 , H03K2005/00013

Abstract: When a signal glitches, logic receiving the signal may change in response, thereby charging and/or discharging nodes within the logic and dissipating power. Providing a glitch-free signal may reduce the number of times the nodes are charged and/or discharged, thereby reducing the power dissipation. A technique for eliminating glitches in a signal is to insert a storage element that samples the signal after it is done changing to produce a glitch-free output signal. The storage element is enabled by a “ready” signal having a delay that matches the delay of circuitry generating the signal. The technique prevents the output signal from changing until the final value of the signal is achieved. The output signal changes only once, typically reducing the number of times nodes in the logic receiving the signal are charged and/or discharged so that power dissipation is also reduced.

8.

发明授权
Glitch-free multiplexer 有权

公开(公告)号：US11070205B1

公开(公告)日：2021-07-20

申请号：US16919324

申请日：2020-07-02

Applicant: NVIDIA Corporation

Inventor： William James Dally

IPC: H03K19/003 , G06N3/063

Abstract: When a signal glitches, logic receiving the signal may change in response, thereby charging and/or discharging nodes within the logic and dissipating power. Providing a glitch-free signal may reduce the number of times the nodes are charged and/or discharged, thereby reducing the power dissipation. A technique for eliminating glitches in a signal is to insert a storage element that samples the signal after it is done changing to produce a glitch-free output signal. The storage element is enabled by a “ready” signal having a delay that matches the delay of circuitry generating the signal. The technique prevents the output signal from changing until the final value of the signal is achieved. The output signal changes only once, typically reducing the number of times nodes in the logic receiving the signal are charged and/or discharged so that power dissipation is also reduced.

9.

发明申请
ASYNCHRONOUS ACCUMULATOR USING LOGARITHMIC-BASED ARITHMETIC 有权

公开(公告)号：US20210056399A1

公开(公告)日：2021-02-25

申请号：US16750917

申请日：2020-01-23

Applicant: NVIDIA Corporation

Inventor： William James Dally , Rangharajan Venkatesan , Brucek Kurdo Khailany , Stephen G. Tell

IPC: G06N3/063 , G06F17/16

Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.

10.

发明授权
DRAM with segmented word line switching circuit for causing selection of portion of rows and circuitry for a variable page width control scheme 有权

公开(公告)号：US10026468B2

公开(公告)日：2018-07-17

申请号：US15430393

申请日：2017-02-10

Applicant: NVIDIA Corporation

Inventor： William James Dally

IPC: G11C11/4091 , G11C11/408 , G11C11/4099

Abstract: This description is directed to a dynamic random access memory (DRAM) array having a plurality of rows and a plurality of columns. The array further includes a plurality of cells, each of which are associated with one of the columns and one of the rows. Each cell includes a capacitor that is selectively coupled to a bit line of its associate column so as to share charge with the bit line when the cell is selected. There is a segmented word line circuit for each row, which is controllable to cause selection of only a portion of the cells in the row.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification