-
公开(公告)号:US20140204657A1
公开(公告)日:2014-07-24
申请号:US13748499
申请日:2013-01-23
Applicant: NVIDIA CORPORATION
Inventor: William James Dally
IPC: G11C11/412
CPC classification number: G11C11/4125 , G11C11/404 , G11C11/419
Abstract: The disclosure provides for an SRAM array having a plurality of wordlines and a plurality of bitlines, referred to generally as SRAM lines. The array has a plurality of cells, each cell being defined by an intersection between one of the wordlines and one of the bitlines. The SRAM array further includes voltage boost circuitry operatively coupled with the cells, the voltage boost circuitry being configured to provide an amount of voltage boost that is based on an address of a cell to be accessed and/or to provide this voltage boost on an SRAM line via capacitive charge coupling.
Abstract translation: 本公开提供了具有多个字线和多个位线的SRAM阵列,通常称为SRAM线。 该阵列具有多个单元,每个单元由字线之一和位线之一的交点定义。 所述SRAM阵列还包括与所述单元操作地耦合的升压电路,所述升压电路被配置为提供基于待访问的单元的地址和/或在SRAM上提供该电压升压的一定量的升压电压 线通过电容电荷耦合。
-
公开(公告)号:US11977766B2
公开(公告)日:2024-05-07
申请号:US17683292
申请日:2022-02-28
Applicant: NVIDIA Corporation
Inventor: William James Dally , Carl Thomas Gray , Stephen W. Keckler , James Michael O'Connor
IPC: G06F3/06
CPC classification number: G06F3/0655 , G06F3/0604 , G06F3/0679
Abstract: A hierarchical network enables access for a stacked memory system including or more memory dies that each include multiple memory tiles. The processor die includes multiple processing tiles that are stacked with the one or more memory die. The memory tiles that are vertically aligned with a processing tile are directly coupled to the processing tile and comprise the local memory block for the processing tile. The hierarchical network provides access paths for each processing tile to access the processing tile's local memory block, the local memory block coupled to a different processing tile within the same processing die, memory tiles in a different die stack, and memory tiles in a different device. The ratio of memory bandwidth (byte) to floating-point operation (B:F) may improve 50× for accessing the local memory block compared with conventional memory. Additionally, the energy consumed to transfer each bit may be reduced by 10×.
-
公开(公告)号:US20230385232A1
公开(公告)日:2023-11-30
申请号:US18227241
申请日:2023-07-27
Applicant: NVIDIA Corporation
Inventor: William James Dally
CPC classification number: G06F15/80 , G06F12/0646 , G06F2212/7201
Abstract: A mapping may be made between an array of physical processors and an array of functional logical processors. Also, a mapping may be made between logical memory channels (associated with the logical processors) and functional physical memory channels (associated with the physical processors). These mappings may be stored within one or more tables, which may then be used to bypass faulty processors and memory channels when implementing memory accesses, while optimizing locality (e.g., by minimizing the proximity of memory channels to processors).
-
公开(公告)号:US20210048992A1
公开(公告)日:2021-02-18
申请号:US16811068
申请日:2020-03-06
Applicant: Nvidia Corporation
Inventor: William James Dally
Abstract: The disclosure provides processors that are configured to perform dynamic programming according to an instruction, a method for configuring a processor for dynamic programming according to an instruction and a method of computing a modified Smith Waterman algorithm employing an instruction for configuring a parallel processing unit. In one example, the method for configuring includes: (1) receiving, by execution cores of the processor, an instruction that directs the execution cores to compute a set of recurrence equations employing a matrix, (2) configuring the execution cores, according to the set of recurrence equations, to compute states for elements of the matrix, and (3) storing the computed states for current elements of the matrix in registers of the execution cores, wherein the computed states are determined based on the set of recurrence equations and input data.
-
15.
公开(公告)号:US20140232368A1
公开(公告)日:2014-08-21
申请号:US13770656
申请日:2013-02-19
Applicant: NVIDIA CORPORATION
Inventor: William James Dally
IPC: H02M3/155
CPC classification number: H02M3/155 , H02M3/1584 , H02M2003/1566
Abstract: The disclosure is directed to a multi-phase electric power conversion device coupled between a power source and a load. The device includes a first regulator phase and a second regulator phase arranged in parallel, so that a first phase current and a second phase current are controllably provided in parallel to satisfy the current demand requirements of the load. Each phase current is based on current generated in an energy storage device within the respective phase. The regulator phases are asymmetric in that the energy storage device of the second regulator phase is configured so that its current can be varied more rapidly than the current in the energy storage device of the first regulator phase.
Abstract translation: 本公开涉及耦合在电源和负载之间的多相电力转换装置。 该装置包括并联布置的第一调节器相和第二调节器相,使得可并联地可控地提供第一相电流和第二相电流,以满足负载的当前需求要求。 每相电流基于在相应相位内的能量存储装置中产生的电流。 调节器相位是不对称的,因为第二调节器相的能量存储装置被配置为使得其电流可以比第一调节器相的能量存储装置中的电流更快地变化。
-
公开(公告)号:US20140117951A1
公开(公告)日:2014-05-01
申请号:US13663903
申请日:2012-10-30
Applicant: NVIDIA CORPORATION
Inventor: William James Dally
IPC: G05F1/10
CPC classification number: H02M3/158 , H02M3/1582 , H02M2001/007 , H02M2003/1566
Abstract: Embodiments are disclosed relating to an electric power conversion device and methods for controlling the operation thereof. One disclosed embodiment provides a multi-stage electric power conversion device including a first regulator stage including a first stage energy storage device and a second regulator stage including a second stage energy storage device, the second stage energy storage device being operatively coupled between the first stage energy storage device and the load. The device further includes a control mechanism operative to control (i) a first stage output voltage on a node between the first stage energy storage device and the second stage energy storage device and (ii) a second stage output voltage on a node between the second stage energy storage device and the load.
Abstract translation: 公开了关于电力转换装置的实施例以及用于控制其操作的方法。 一个公开的实施例提供了一种多级电力转换装置,其包括第一调节器级,其包括第一级储能装置和包括第二级储能装置的第二调节器级,第二级储能装置可操作地耦合在第一级 储能装置和负载。 该装置还包括控制机构,其操作以控制(i)第一级能量存储装置与第二级能量存储装置之间的节点上的第一级输出电压和(ii)第二级能量存储装置之间的节点上的第二级输出电压 阶段储能装置和负载。
-
公开(公告)号:US20240411709A1
公开(公告)日:2024-12-12
申请号:US18810657
申请日:2024-08-21
Applicant: NVIDIA Corporation
Inventor: William James Dally , Carl Thomas Gray , Stephen W. Keckler , James Michael O'Connor
IPC: G06F13/16 , G11C8/12 , H03K19/1776
Abstract: Embodiments of the present disclosure relate to application partitioning for locality in a stacked memory system. In an embodiment, one or more memory dies are stacked on the processor die. The processor die includes multiple processing tiles and each memory die includes multiple memory tiles. Vertically aligned memory tiles are directly coupled to and comprise the local memory block for a corresponding processing tile. An application program that operates on dense multi-dimensional arrays (matrices) may partition the dense arrays into sub-arrays associated with program tiles. Each program tile is executed by a processing tile using the processing tile's local memory block to process the associated sub-array. Data associated with each sub-array is stored in a local memory block and the processing tile corresponding to the local memory block executes the program tile to process the sub-array data.
-
公开(公告)号:US20240311626A1
公开(公告)日:2024-09-19
申请号:US18674632
申请日:2024-05-24
Applicant: NVIDIA Corporation
Abstract: Neural networks, in many cases, include convolution layers that are configured to perform many convolution operations that require multiplication and addition operations. Compared with performing multiplication on integer, fixed-point, or floating-point format values, performing multiplication on logarithmic format values is straightforward and energy efficient as the exponents are simply added. However, performing addition on logarithmic format values is more complex. Conventionally, addition is performed by converting the logarithmic format values to integers, computing the sum, and then converting the sum back into the logarithmic format. Instead, logarithmic format values may be added by decomposing the exponents into separate quotient and remainder components, sorting the quotient components based on the remainder components, summing the sorted quotient components using an asynchronous accumulator to produce partial sums, and multiplying the partial sums by the remainder components to produce a sum. The sum may then be converted back into the logarithmic format.
-
公开(公告)号:US20230315651A1
公开(公告)日:2023-10-05
申请号:US17709031
申请日:2022-03-30
Applicant: NVIDIA Corporation
Inventor: William James Dally , Carl Thomas Gray , Stephen W. Keckler , James Michael O'Connor
IPC: G06F13/16 , H03K19/1776 , G11C8/12
CPC classification number: G06F13/161 , G06F13/1689 , G06F13/1673 , H03K19/1776 , G11C8/12
Abstract: Embodiments of the present disclosure relate to application partitioning for locality in a stacked memory system. In an embodiment, one or more memory dies are stacked on the processor die. The processor die includes multiple processing tiles and each memory die includes multiple memory tiles. Vertically aligned memory tiles are directly coupled to and comprise the local memory block for a corresponding processing tile. An application program that operates on dense multi-dimensional arrays (matrices) may partition the dense arrays into sub-arrays associated with program tiles. Each program tile is executed by a processing tile using the processing tile's local memory block to process the associated sub-array. Data associated with each sub-array is stored in a local memory block and the processing tile corresponding to the local memory block executes the program tile to process the sub-array data.
-
公开(公告)号:US20230297499A1
公开(公告)日:2023-09-21
申请号:US17581687
申请日:2022-01-21
Applicant: NVIDIA Corporation
IPC: G06F12/02
CPC classification number: G06F12/0238 , G06F2212/657
Abstract: A mapper within a single-level memory system may facilitate memory localization to reduce the energy and latency of memory accesses within the single-level memory system. The mapper may translate a memory request received from a processor for implementation at a data storage entity, where the translating identifies a data storage entity and a starting location within the data storage entity where the data associated with the memory request is located. This data storage entity may be co-located with the processor that sent the request, which may enable the localization of memory and significantly improve the performance of memory usage by reducing an energy of data access and increasing data bandwidth.
-
-
-
-
-
-
-
-
-