-
公开(公告)号:US11886980B2
公开(公告)日:2024-01-30
申请号:US16549683
申请日:2019-08-23
Applicant: NVIDIA Corporation
Inventor: William James Dally , Rangharajan Venkatesan , Brucek Kurdo Khailany
CPC classification number: G06N3/063 , G06F7/4833 , G06F17/16
Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.
-
公开(公告)号:US20220004864A1
公开(公告)日:2022-01-06
申请号:US16919375
申请日:2020-07-02
Applicant: NVIDIA Corporation
Inventor: William James Dally
Abstract: When a signal glitches, logic receiving the signal may change in response, thereby charging and/or discharging nodes within the logic and dissipating power. Providing a glitch-free signal may reduce the number of times the nodes are charged and/or discharged, thereby reducing the power dissipation. A technique for eliminating glitches in a signal is to insert a storage element that samples the signal after it is done changing to produce a glitch-free output signal. The storage element is enabled by a “ready” signal having a delay that matches the delay of circuitry generating the signal. The technique prevents the output signal from changing until the final value of the signal is achieved. The output signal changes only once, typically reducing the number of times nodes in the logic receiving the signal are charged and/or discharged so that power dissipation is also reduced.
-
公开(公告)号:US20210056446A1
公开(公告)日:2021-02-25
申请号:US16750823
申请日:2020-01-23
Applicant: NVIDIA Corporation
Inventor: William James Dally , Rangharajan Venkatesan , Brucek Kurdo Khailany
IPC: G06N5/04
Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.
-
公开(公告)号:US12223201B2
公开(公告)日:2025-02-11
申请号:US18438139
申请日:2024-02-09
Applicant: NVIDIA Corporation
Inventor: William James Dally , Carl Thomas Gray , Stephen W. Keckler , James Michael O'Connor
IPC: G06F3/06
Abstract: A hierarchical network enables access for a stacked memory system including or more memory dies that each include multiple memory tiles. The processor die includes multiple processing tiles that are stacked with the one or more memory die. The memory tiles that are vertically aligned with a processing tile are directly coupled to the processing tile and comprise the local memory block for the processing tile. The hierarchical network provides access paths for each processing tile to access the processing tile's local memory block, the local memory block coupled to a different processing tile within the same processing die, memory tiles in a different die stack, and memory tiles in a different device. The ratio of memory bandwidth (byte) to floating-point operation (B:F) may improve 50× for accessing the local memory block compared with conventional memory. Additionally, the energy consumed to transfer each bit may be reduced by 10×.
-
公开(公告)号:US12118454B2
公开(公告)日:2024-10-15
申请号:US18537570
申请日:2023-12-12
Applicant: NVIDIA Corporation
Inventor: William James Dally , Rangharajan Venkatesan , Brucek Kurdo Khailany
CPC classification number: G06N3/063 , G06F7/4833 , G06F17/16
Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.
-
公开(公告)号:US20240211166A1
公开(公告)日:2024-06-27
申请号:US18438139
申请日:2024-02-09
Applicant: NVIDIA Corporation
Inventor: William James Dally , Carl Thomas Gray , Stephen W. Keckler , James Michael O'Connor
IPC: G06F3/06
CPC classification number: G06F3/0655 , G06F3/0604 , G06F3/0679
Abstract: A hierarchical network enables access for a stacked memory system including or more memory dies that each include multiple memory tiles. The processor die includes multiple processing tiles that are stacked with the one or more memory die. The memory tiles that are vertically aligned with a processing tile are directly coupled to the processing tile and comprise the local memory block for the processing tile. The hierarchical network provides access paths for each processing tile to access the processing tile's local memory block, the local memory block coupled to a different processing tile within the same processing die, memory tiles in a different die stack, and memory tiles in a different device. The ratio of memory bandwidth (byte) to floating-point operation (B:F) may improve 50× for accessing the local memory block compared with conventional memory. Additionally, the energy consumed to transfer each bit may be reduced by 10×.
-
公开(公告)号:US11809989B2
公开(公告)日:2023-11-07
申请号:US16919375
申请日:2020-07-02
Applicant: NVIDIA Corporation
Inventor: William James Dally
IPC: G06N3/045 , G06N3/084 , G06N3/08 , H03K3/037 , H03K5/01 , G06N3/063 , G06N3/04 , G06F1/12 , H03K19/003 , H03K5/00
CPC classification number: G06N3/08 , G06F1/12 , G06N3/04 , G06N3/063 , H03K3/037 , H03K5/01 , H03K19/003 , H03K2005/00013
Abstract: When a signal glitches, logic receiving the signal may change in response, thereby charging and/or discharging nodes within the logic and dissipating power. Providing a glitch-free signal may reduce the number of times the nodes are charged and/or discharged, thereby reducing the power dissipation. A technique for eliminating glitches in a signal is to insert a storage element that samples the signal after it is done changing to produce a glitch-free output signal. The storage element is enabled by a “ready” signal having a delay that matches the delay of circuitry generating the signal. The technique prevents the output signal from changing until the final value of the signal is achieved. The output signal changes only once, typically reducing the number of times nodes in the logic receiving the signal are charged and/or discharged so that power dissipation is also reduced.
-
公开(公告)号:US11070205B1
公开(公告)日:2021-07-20
申请号:US16919324
申请日:2020-07-02
Applicant: NVIDIA Corporation
Inventor: William James Dally
IPC: H03K19/003 , G06N3/063
Abstract: When a signal glitches, logic receiving the signal may change in response, thereby charging and/or discharging nodes within the logic and dissipating power. Providing a glitch-free signal may reduce the number of times the nodes are charged and/or discharged, thereby reducing the power dissipation. A technique for eliminating glitches in a signal is to insert a storage element that samples the signal after it is done changing to produce a glitch-free output signal. The storage element is enabled by a “ready” signal having a delay that matches the delay of circuitry generating the signal. The technique prevents the output signal from changing until the final value of the signal is achieved. The output signal changes only once, typically reducing the number of times nodes in the logic receiving the signal are charged and/or discharged so that power dissipation is also reduced.
-
公开(公告)号:US20210056399A1
公开(公告)日:2021-02-25
申请号:US16750917
申请日:2020-01-23
Applicant: NVIDIA Corporation
Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.
-
公开(公告)号:US10026468B2
公开(公告)日:2018-07-17
申请号:US15430393
申请日:2017-02-10
Applicant: NVIDIA Corporation
Inventor: William James Dally
IPC: G11C11/4091 , G11C11/408 , G11C11/4099
Abstract: This description is directed to a dynamic random access memory (DRAM) array having a plurality of rows and a plurality of columns. The array further includes a plurality of cells, each of which are associated with one of the columns and one of the rows. Each cell includes a capacitor that is selectively coupled to a bit line of its associate column so as to share charge with the bit line when the cell is selected. There is a segmented word line circuit for each row, which is controllable to cause selection of only a portion of the cells in the row.
-
-
-
-
-
-
-
-
-