-
公开(公告)号:US09934145B2
公开(公告)日:2018-04-03
申请号:US14925922
申请日:2015-10-28
Applicant: NVIDIA CORPORATION
Inventor: Praveen Krishnamurthy , Peter B. Holmquist , Wishwesh Gandhi , Timothy Purcell , Karan Mehra , Lacky Shah
IPC: G06F12/08 , G06F12/0802 , G06F3/06
CPC classification number: G06F12/0802 , G06F3/0608 , G06F3/064 , G06F3/0673 , G06F12/0842 , G06F12/0844 , G06F12/0848 , G06F12/0851 , G06F12/0853 , G06F12/0895 , G06F2212/1016 , G06F2212/401 , G06F2212/608
Abstract: In one embodiment of the present invention a cache unit organizes data stored in an attached memory to optimize accesses to compressed data. In operation, the cache unit introduces a layer of indirection between a physical address associated with a memory access request and groups of blocks in the attached memory. The layer of indirection—virtual tiles—enables the cache unit to selectively store compressed data that would conventionally be stored in separate physical tiles included in a group of blocks in a single physical tile. Because the cache unit stores compressed data associated with multiple physical tiles in a single physical tile and, more specifically, in adjacent locations within the single physical tile, the cache unit coalesces the compressed data into contiguous blocks. Subsequently, upon performing a read operation, the cache unit may retrieve the compressed data conventionally associated with separate physical tiles in a single read operation.
-
公开(公告)号:US20210294707A1
公开(公告)日:2021-09-23
申请号:US16825276
申请日:2020-03-20
Applicant: NVIDIA Corporation
Inventor: Jonathon Stuart Ramsay Evans , Naveen Cherukuri , Jerome Francis Duluk, JR. , Shailendra Singh , Vaibhav Vyas , Wishwesh Gandhi , Arvind Gopalakrishnan , Manas Mandal
Abstract: Apparatuses, systems, and techniques to detect memory errors and isolate or migrate partitions on a parallel processing unit using an application programming interface to facilitate parallel computing, such as CUDA. In at least one embodiment, interrupts are intercepted and processed on a graphics processing unit indicating a memory error for one or more partitions, and a policy is applied to isolate that memory error from other partitions.
-
公开(公告)号:US11016802B2
公开(公告)日:2021-05-25
申请号:US15881587
申请日:2018-01-26
Applicant: NVIDIA Corporation
Inventor: Ziyad Hakura , Olivier Giroux , Wishwesh Gandhi
Abstract: In various embodiments, an ordered atomic operation enables a parallel processing subsystem to executes an atomic operation associated with a memory location in a specified order relative to other ordered atomic operations associated with the memory location. A level 2 (L2) cache slice includes an atomic processing circuit and a content-addressable memory (CAM). The CAM stores an ordered atomic operation specifying at least a memory address, an atomic operation, and an ordering number. In operation, the atomic processing circuit performs a look-up operation on the CAM, where the look-up operation specifies the memory address. After the atomic processing circuit determines that the ordering number is equal to a current ordering number associated with the memory address, the atomic processing circuit executes the atomic operation and returns the result to a processor executing an algorithm. Advantageously, the ordered atomic operation enables the algorithm to achieve a deterministic result while optimizing latency.
-
公开(公告)号:US10402323B2
公开(公告)日:2019-09-03
申请号:US14925920
申请日:2015-10-28
Applicant: NVIDIA CORPORATION
Inventor: Praveen Krishnamurthy , Peter B. Holmquist , Wishwesh Gandhi , Timothy Purcell , Karan Mehra , Lacky Shah
IPC: G06F12/08 , G06F12/0802 , G06F3/06
Abstract: In one embodiment of the present invention a cache unit organizes data stored in an attached memory to optimize accesses to compressed data. In operation, the cache unit introduces a layer of indirection between a physical address associated with a memory access request and groups of blocks in the attached memory. The layer of indirection—virtual tiles—enables the cache unit to selectively store compressed data that would conventionally be stored in separate physical tiles included in a group of blocks in a single physical tile. Because the cache unit stores compressed data associated with multiple physical tiles in a single physical tile and, more specifically, in adjacent locations within the single physical tile, the cache unit coalesces the compressed data into contiguous blocks. Subsequently, upon performing a read operation, the cache unit may retrieve the compressed data conventionally associated with separate physical tiles in a single read operation.
-
公开(公告)号:US10020036B2
公开(公告)日:2018-07-10
申请号:US13712823
申请日:2012-12-12
Applicant: NVIDIA CORPORATION
Inventor: Alok Gupta , Wishwesh Gandhi , Ram Gummadi
CPC classification number: G11C8/00 , G06F12/0207 , G11C7/1075 , G11C8/06 , G11C2207/105 , Y02D10/13
Abstract: One embodiment of the present invention sets forth a method for accessing non-contiguous locations within a DRAM memory page by sending a first column address command to a first DRAM device using a first subset of pins and sending a second column address command to a second DRAM device using a second subset of repurposed pins. The technique requires minimal additional pins, space, and power consumption. Further, sending multiple column address commands allows for increased granularity of DRAM accesses and therefore more efficient use of pins. The technique for accessing non-contiguous locations within a DRAM memory page.
-
-
-
-