COMBINING A REMOTE TLB LOOKUP AND A SUBSEQUENT CACHE MISS INTO A SINGLE COHERENCE OPERATION
    1.
    发明申请
    COMBINING A REMOTE TLB LOOKUP AND A SUBSEQUENT CACHE MISS INTO A SINGLE COHERENCE OPERATION 有权
    组合远程TLB查询和后续的高速缓存进入单一的相关操作

    公开(公告)号:US20140013074A1

    公开(公告)日:2014-01-09

    申请号:US13494843

    申请日:2012-06-12

    IPC分类号: G06F12/10

    摘要: The disclosed embodiments provide techniques for reducing address-translation latency and the serialization latency of combined TLB and data cache misses in a coherent shared-memory system. For instance, the last-level TLB structures of two or more multiprocessor nodes can be configured to act together as either a distributed shared last-level TLB or a directory-based shared last-level TLB. Such TLB-sharing techniques increase the total amount of useful translations that are cached by the system, thereby reducing the number of page-table walks and improving performance. Furthermore, a coherent shared-memory system with a shared last-level TLB can be further configured to fuse TLB and cache misses such that some of the latency of data coherence operations is overlapped with address translation and data cache access latencies, thereby further improving the performance of memory operations.

    摘要翻译: 所公开的实施例提供了用于在一致的共享存储器系统中减少地址转换等待时间和组合TLB和数据高速缓存未命中的串行化延迟的技术。 例如,两个或多个多处理器节点的最后一级TLB结构可以配置为一起作为分布式共享的最后一级TLB或基于目录的共享的最后一级TLB。 这种TLB共享技术增加了系统缓存的有用的翻译的总量,从而减少了页表行进的数量并提高了性能。 此外,具有共享的最后一级TLB的一致的共享存储器系统可以被进一步配置为对TLB和高速缓存未命中进行融合,使得数据相干操作的一些等待时间与地址转换和数据高速缓存访​​问延迟重叠,从而进一步改善 记忆操作的表现。

    Combining a remote TLB lookup and a subsequent cache miss into a single coherence operation
    2.
    发明授权
    Combining a remote TLB lookup and a subsequent cache miss into a single coherence operation 有权
    将远程TLB查找和后续高速缓存未命中组合到单个相干操作中

    公开(公告)号:US09003163B2

    公开(公告)日:2015-04-07

    申请号:US13494843

    申请日:2012-06-12

    IPC分类号: G06F12/10 G06F12/08

    摘要: The disclosed embodiments provide techniques for reducing address-translation latency and the serialization latency of combined TLB and data cache misses in a coherent shared-memory system. For instance, the last-level TLB structures of two or more multiprocessor nodes can be configured to act together as either a distributed shared last-level TLB or a directory-based shared last-level TLB. Such TLB-sharing techniques increase the total amount of useful translations that are cached by the system, thereby reducing the number of page-table walks and improving performance. Furthermore, a coherent shared-memory system with a shared last-level TLB can be further configured to fuse TLB and cache misses such that some of the latency of data coherence operations is overlapped with address translation and data cache access latencies, thereby further improving the performance of memory operations.

    摘要翻译: 所公开的实施例提供了用于在一致的共享存储器系统中减少地址转换等待时间和组合TLB和数据高速缓存未命中的串行化延迟的技术。 例如,两个或多个多处理器节点的最后一级TLB结构可以配置为一起作为分布式共享的最后一级TLB或基于目录的共享的最后一级TLB。 这种TLB共享技术增加了系统缓存的有用的翻译的总量,从而减少了页表行进的数量并提高了性能。 此外,具有共享的最后一级TLB的一致的共享存储器系统可以被进一步配置为对TLB和高速缓存未命中进行融合,使得数据相干操作的一些等待时间与地址转换和数据高速缓存访​​问延迟重叠,从而进一步改善 记忆操作的表现。