-
公开(公告)号:US12248788B2
公开(公告)日:2025-03-11
申请号:US17691690
申请日:2022-03-10
Applicant: NVIDIA Corporation
Inventor: Prakash Bangalore Prabhakar , Gentaro Hirota , Ronny Krashinsky , Ze Long , Brian Pharris , Rajballav Dash , Jeff Tuckey , Jerome F. Duluk, Jr. , Lacky Shah , Luke Durant , Jack Choquette , Eric Werness , Naman Govil , Manan Patel , Shayani Deb , Sandeep Navada , John Edmondson , Greg Palmer , Wish Gandhi , Ravi Manyam , Apoorv Parle , Olivier Giroux , Shirish Gadre , Steve Heinrich
Abstract: Distributed shared memory (DSMEM) comprises blocks of memory that are distributed or scattered across a processor (such as a GPU). Threads executing on a processing core local to one memory block are able to access a memory block local to a different processing core. In one embodiment, shared access to these DSMEM allocations distributed across a collection of processing cores is implemented by communications between the processing cores. Such distributed shared memory provides very low latency memory access for processing cores located in proximity to the memory blocks, and also provides a way for more distant processing cores to also access the memory blocks in a manner and using interconnects that do not interfere with the processing cores' access to main or global memory such as hacked by an L2 cache. Such distributed shared memory supports cooperative parallelism and strong scaling across multiple processing cores by permitting data sharing and communications previously possible only within the same processing core.
-
公开(公告)号:US11720440B2
公开(公告)日:2023-08-08
申请号:US17373678
申请日:2021-07-12
Applicant: NVIDIA CORPORATION
Inventor: Naveen Cherukuri , Saurabh Hukerikar , Paul Racunas , Nirmal Raj Saxena , David Charles Patrick , Yiyang Feng , Abhijeet Ghadge , Steven James Heinrich , Adam Hendrickson , Gentaro Hirota , Praveen Joginipally , Vaishali Kulkarni , Peter C. Mills , Sandeep Navada , Manan Patel , Liang Yin
IPC: G06F11/07 , G06F11/10 , G06F12/1018 , G06F11/14 , G06F12/1027
CPC classification number: G06F11/1016 , G06F11/0772 , G06F11/0793 , G06F11/1407 , G06F12/1018 , G06F12/1027
Abstract: Various embodiments include a parallel processing computer system that detects memory errors as a memory client loads data from memory and disables the memory client from storing data to memory, thereby reducing the likelihood that the memory error propagates to other memory clients. The memory client initiates a stall sequence, while other memory clients continue to execute instructions and the memory continues to service memory load and store operations. When a memory error is detected, a specific bit pattern is stored in conjunction with the data associated with the memory error. When the data is copied from one memory to another memory, the specific bit pattern is also copied, in order to identify the data as having a memory error.
-