-
公开(公告)号:US12001725B2
公开(公告)日:2024-06-04
申请号:US18454693
申请日:2023-08-23
Applicant: NVIDIA Corporation
Inventor: Niladrish Chatterjee , James Michael O'Connor , Donghyuk Lee , Gaurav Uttreja , Wishwesh Anil Gandhi
CPC classification number: G06F3/0659 , G06F3/0604 , G06F3/0673 , G06F12/0607 , G06F12/10 , G06F2212/151 , G06F2212/154 , G06F2212/657 , H01L25/18
Abstract: A combined on-package and off-package memory system uses a custom base-layer within which are fabricated one or more dedicated interfaces to off-package memories. An on-package processor and on-package memories are also directly coupled to the custom base-layer. The custom base-layer includes memory management logic between the processor and memories (both off and on package) to steer requests. The memories are exposed as a combined memory space having greater bandwidth and capacity compared with either the off-package memories or the on-package memories alone. The memory management logic services requests while maintaining quality of service (QoS) to satisfy bandwidth requirements for each allocation. An allocation may include any combination of the on and/or off package memories. The memory management logic also manages data migration between the on and off package memories.
-
公开(公告)号:US11789649B2
公开(公告)日:2023-10-17
申请号:US17237165
申请日:2021-04-22
Applicant: NVIDIA Corporation
Inventor: Niladrish Chatterjee , James Michael O'Connor , Donghyuk Lee , Gaurav Uttreja , Wishwesh Anil Gandhi
CPC classification number: G06F3/0659 , G06F3/0604 , G06F3/0673 , G06F12/0607 , G06F12/10 , G06F2212/151 , G06F2212/154 , G06F2212/657 , H01L25/18
Abstract: A combined on-package and off-package memory system uses a custom base-layer within which are fabricated one or more dedicated interfaces to off-package memories. An on-package processor and on-package memories are also directly coupled to the custom base-layer. The custom base-layer includes memory management logic between the processor and memories (both off and on package) to steer requests. The memories are exposed as a combined memory space having greater bandwidth and capacity compared with either the off-package memories or the on-package memories alone. The memory management logic services requests while maintaining quality of service (QoS) to satisfy bandwidth requirements for each allocation. An allocation may include any combination of the on and/or off package memories. The memory management logic also manages data migration between the on and off package memories.
-
公开(公告)号:US11635986B2
公开(公告)日:2023-04-25
申请号:US16562359
申请日:2019-09-05
Applicant: NVIDIA CORPORATION
Inventor: Jerome F. Duluk, Jr. , Gregory Scott Palmer , Jonathon Stuart Ramsey Evans , Shailendra Singh , Samuel H. Duncan , Wishwesh Anil Gandhi , Lacky V. Shah , Eric Rock , Feiqi Su , James Leroy Deming , Alan Menezes , Pranav Vaidya , Praveen Joginipally , Timothy John Purcell , Manas Mandal
Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.
-
公开(公告)号:US11263051B2
公开(公告)日:2022-03-01
申请号:US16866811
申请日:2020-05-05
Applicant: NVIDIA Corporation
Inventor: Ram Rangan , Suryakant Patidar , Praveen Krishnamurthy , Wishwesh Anil Gandhi
Abstract: Accesses between a processor and its external memory is reduced when the processor internally maintains a compressed version of values stored in the external memory. The processor can then refer to the compressed version rather than access the external memory. One compression technique involves maintaining a dictionary on the processor mapping portions of a memory to values. When all of the values of a portion of memory are uniform (e.g., the same), the value is stored in the dictionary for that portion of memory. Thereafter, when the processor needs to access that portion of memory, the value is retrieved from the dictionary rather than from external memory. Techniques are disclosed herein to extend, for example, the capabilities of such dictionary-based compression so that the amount of accesses between the processor and its external memory are further reduced.
-
公开(公告)号:US09110809B2
公开(公告)日:2015-08-18
申请号:US13935414
申请日:2013-07-03
Applicant: NVIDIA Corporation
Inventor: Peter B. Holmqvist , Karan Mehra , George R. Lynch , James Patrick Robertson , Gregory Alan Muthler , Wishwesh Anil Gandhi , Nick Barrow-Williams
CPC classification number: G06F12/0842 , G06F11/1004 , G06F12/0886 , G06F2212/1016 , G11C7/1006 , G11C7/1072
Abstract: A method for managing memory traffic includes causing first data to be written to a data cache memory, where a first write request comprises a partial write and writes the first data to a first portion of the data cache memory, and further includes tracking the number of partial writes in the data cache memory. The method further includes issuing a fill request for one or more partial writes in the data cache memory if the number of partial writes in the data cache memory is greater than a predetermined first threshold.
Abstract translation: 一种用于管理存储器流量的方法包括使第一数据被写入数据高速缓冲存储器,其中第一写入请求包括部分写入,并将第一数据写入数据高速缓冲存储器的第一部分,并且还包括跟踪数据高速缓冲存储器的数量 部分写入数据高速缓冲存储器。 该方法还包括如果数据高速缓冲存储器中的部分写入数大于预定的第一阈值,则向数据高速缓冲存储器发出一个或多个部分写入的填充请求。
-
公开(公告)号:US11893423B2
公开(公告)日:2024-02-06
申请号:US16562367
申请日:2019-09-05
Applicant: NVIDIA CORPORATION
Inventor: Jerome F. Duluk, Jr. , Gregory Scott Palmer , Jonathon Stuart Ramsey Evans , Shailendra Singh , Samuel H. Duncan , Wishwesh Anil Gandhi , Lacky V. Shah , Sonata Gale Wen , Feiqi Su , James Leroy Deming , Alan Menezes , Pranav Vaidya , Praveen Joginipally , Timothy John Purcell , Manas Mandal
IPC: G06F9/50 , G06F9/38 , G06F1/3296 , G06F1/04
CPC classification number: G06F9/5061 , G06F1/04 , G06F1/3296 , G06F9/3877 , G06F9/5027
Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.
-
公开(公告)号:US11663036B2
公开(公告)日:2023-05-30
申请号:US16562359
申请日:2019-09-05
Applicant: NVIDIA CORPORATION
Inventor: Jerome F. Duluk, Jr. , Gregory Scott Palmer , Jonathon Stuart Ramsey Evans , Shailendra Singh , Samuel H. Duncan , Wishwesh Anil Gandhi , Lacky V. Shah , Eric Rock , Feiqi Su , James Leroy Deming , Alan Menezes , Pranav Vaidya , Praveen Joginipally , Timothy John Purcell , Manas Mandal
Abstract: A parallel processing unit (PPU) can be divided into partitions. Each partition is configured to operate similarly to how the entire PPU operates. A given partition includes a subset of the computational and memory resources associated with the entire PPU. Software that executes on a CPU partitions the PPU for an admin user. A guest user is assigned to a partition and can perform processing tasks within that partition in isolation from any other guest users assigned to any other partitions. Because the PPU can be divided into isolated partitions, multiple CPU processes can efficiently utilize PPU resources.
-
公开(公告)号:US20220342595A1
公开(公告)日:2022-10-27
申请号:US17237165
申请日:2021-04-22
Applicant: NVIDIA Corporation
Inventor: Niladrish Chatterjee , James Michael O'Connor , Donghyuk Lee , Gaurav Uttreja , Wishwesh Anil Gandhi
Abstract: A combined on-package and off-package memory system uses a custom base-layer within which are fabricated one or more dedicated interfaces to off-package memories. An on-package processor and on-package memories are also directly coupled to the custom base-layer. The custom base-layer includes memory management logic between the processor and memories (both off and on package) to steer requests. The memories are exposed as a combined memory space having greater bandwidth and capacity compared with either the off-package memories or the on-package memories alone. The memory management logic services requests while maintaining quality of service (QoS) to satisfy bandwidth requirements for each allocation. An allocation may include any combination of the on and/or off package memories. The memory management logic also manages data migration between the on and off package memories.
-
公开(公告)号:US20200089611A1
公开(公告)日:2020-03-19
申请号:US16134379
申请日:2018-09-18
Applicant: NVIDIA Corporation
Inventor: Wishwesh Anil Gandhi , Tanmoy Mandal , Ravi Kiran Manyam , Supriya Shrihari Rao
IPC: G06F12/0815 , G06F12/0808 , G06F12/0813 , G06F13/16 , G06F13/40
Abstract: A method, computer readable medium, and system are disclosed for a distributed cache that provides multiple processing units with fast access to a portion of data, which is stored in local memory. The distributed cache is composed of multiple smaller caches, and each of the smaller caches is associated with at least one processing unit. In addition to a shared crossbar network through which data is transferred between processing units and the smaller caches, a dedicated connection is provided between two or more smaller caches that form a partner cache set. Transferring data through the dedicated connections reduces congestion on the shared crossbar network. Reducing congestion on the shared crossbar network increases the available bandwidth and allows the number of processing units to increase. A coherence protocol is defined for accessing data stored in the distributed cache and for transferring data between the smaller caches of a partner cache set.
-
10.
公开(公告)号:US08984372B2
公开(公告)日:2015-03-17
申请号:US13683599
申请日:2012-11-21
Applicant: NVIDIA Corporation
Inventor: Wishwesh Anil Gandhi , Nirmal Raj Saxena
CPC classification number: G06F11/10 , G06F11/1064
Abstract: A partition unit that includes a cache for storing both data and error-correcting code (ECC) checkbits associated with the data is disclosed. When a read command corresponding to particular data stored in a memory unit results in a cache miss, the partition unit transmits a read request to the memory unit to fetch the data and store the data in the cache. The partition unit checks the cache to determine if ECC checkbits associated with the data are stored in the cache and, if the ECC checkbits are not in the cache, the partition unit transmits a read request to the memory unit to fetch the ECC checkbits and store the ECC checkbits in the cache. The ECC checkbits and the data may then be compared to determine the reliability of the data using an error-correcting scheme such as SEC-DED (i.e., single error-correcting, double error-detecting).
Abstract translation: 公开了一种分区单元,其包括用于存储与数据相关联的数据和纠错码(ECC)校验码的高速缓存。 当对应于存储在存储单元中的特定数据的读取命令导致高速缓存未命中时,分区单元向存储器单元发送读取请求以获取数据并将数据存储在高速缓存中。 分区单元检查高速缓存以确定与数据相关联的ECC校验位是否存储在高速缓存中,并且如果ECC校验位不在高速缓存中,则分区单元向存储器单元发送读取请求以获取ECC校验位并存储 缓存中的ECC校验位。 然后可以比较ECC校验位和数据,以使用诸如SEC-DED的纠错方案(即,单错误校正,双重错误检测)来确定数据的可靠性。
-
-
-
-
-
-
-
-
-