-
公开(公告)号:US20230297499A1
公开(公告)日:2023-09-21
申请号:US17581687
申请日:2022-01-21
Applicant: NVIDIA Corporation
IPC: G06F12/02
CPC classification number: G06F12/0238 , G06F2212/657
Abstract: A mapper within a single-level memory system may facilitate memory localization to reduce the energy and latency of memory accesses within the single-level memory system. The mapper may translate a memory request received from a processor for implementation at a data storage entity, where the translating identifies a data storage entity and a starting location within the data storage entity where the data associated with the memory request is located. This data storage entity may be co-located with the processor that sent the request, which may enable the localization of memory and significantly improve the performance of memory usage by reducing an energy of data access and increasing data bandwidth.
-
公开(公告)号:US20230297269A1
公开(公告)日:2023-09-21
申请号:US17683292
申请日:2022-02-28
Applicant: NVIDIA Corporation
Inventor: William James Dally , Carl Thomas Gray , Stephen W. Keckler , James Michael O’Connor
IPC: G06F3/06
CPC classification number: G06F3/0655 , G06F3/0604 , G06F3/0679
Abstract: A hierarchical network enables access for a stacked memory system including or more memory dies that each include multiple memory tiles. The processor die includes multiple processing tiles that are stacked with the one or more memory die. The memory tiles that are vertically aligned with a processing tile are directly coupled to the processing tile and comprise the local memory block for the processing tile. The hierarchical network provides access paths for each processing tile to access the processing tile’s local memory block, the local memory block coupled to a different processing tile within the same processing die, memory tiles in a different die stack, and memory tiles in a different device. The ratio of memory bandwidth (byte) to floating-point operation (B:F) may improve 50x for accessing the local memory block compared with conventional memory. Additionally, the energy consumed to transfer each bit may be reduced by 10x.
-