COMBINING A REMOTE TLB LOOKUP AND A SUBSEQUENT CACHE MISS INTO A SINGLE COHERENCE OPERATION
    1.
    发明申请
    COMBINING A REMOTE TLB LOOKUP AND A SUBSEQUENT CACHE MISS INTO A SINGLE COHERENCE OPERATION 有权
    组合远程TLB查询和后续的高速缓存进入单一的相关操作

    公开(公告)号:US20140013074A1

    公开(公告)日:2014-01-09

    申请号:US13494843

    申请日:2012-06-12

    IPC分类号: G06F12/10

    摘要: The disclosed embodiments provide techniques for reducing address-translation latency and the serialization latency of combined TLB and data cache misses in a coherent shared-memory system. For instance, the last-level TLB structures of two or more multiprocessor nodes can be configured to act together as either a distributed shared last-level TLB or a directory-based shared last-level TLB. Such TLB-sharing techniques increase the total amount of useful translations that are cached by the system, thereby reducing the number of page-table walks and improving performance. Furthermore, a coherent shared-memory system with a shared last-level TLB can be further configured to fuse TLB and cache misses such that some of the latency of data coherence operations is overlapped with address translation and data cache access latencies, thereby further improving the performance of memory operations.

    摘要翻译: 所公开的实施例提供了用于在一致的共享存储器系统中减少地址转换等待时间和组合TLB和数据高速缓存未命中的串行化延迟的技术。 例如,两个或多个多处理器节点的最后一级TLB结构可以配置为一起作为分布式共享的最后一级TLB或基于目录的共享的最后一级TLB。 这种TLB共享技术增加了系统缓存的有用的翻译的总量,从而减少了页表行进的数量并提高了性能。 此外,具有共享的最后一级TLB的一致的共享存储器系统可以被进一步配置为对TLB和高速缓存未命中进行融合,使得数据相干操作的一些等待时间与地址转换和数据高速缓存访​​问延迟重叠,从而进一步改善 记忆操作的表现。

    Combining a remote TLB lookup and a subsequent cache miss into a single coherence operation
    2.
    发明授权
    Combining a remote TLB lookup and a subsequent cache miss into a single coherence operation 有权
    将远程TLB查找和后续高速缓存未命中组合到单个相干操作中

    公开(公告)号:US09003163B2

    公开(公告)日:2015-04-07

    申请号:US13494843

    申请日:2012-06-12

    IPC分类号: G06F12/10 G06F12/08

    摘要: The disclosed embodiments provide techniques for reducing address-translation latency and the serialization latency of combined TLB and data cache misses in a coherent shared-memory system. For instance, the last-level TLB structures of two or more multiprocessor nodes can be configured to act together as either a distributed shared last-level TLB or a directory-based shared last-level TLB. Such TLB-sharing techniques increase the total amount of useful translations that are cached by the system, thereby reducing the number of page-table walks and improving performance. Furthermore, a coherent shared-memory system with a shared last-level TLB can be further configured to fuse TLB and cache misses such that some of the latency of data coherence operations is overlapped with address translation and data cache access latencies, thereby further improving the performance of memory operations.

    摘要翻译: 所公开的实施例提供了用于在一致的共享存储器系统中减少地址转换等待时间和组合TLB和数据高速缓存未命中的串行化延迟的技术。 例如,两个或多个多处理器节点的最后一级TLB结构可以配置为一起作为分布式共享的最后一级TLB或基于目录的共享的最后一级TLB。 这种TLB共享技术增加了系统缓存的有用的翻译的总量,从而减少了页表行进的数量并提高了性能。 此外,具有共享的最后一级TLB的一致的共享存储器系统可以被进一步配置为对TLB和高速缓存未命中进行融合,使得数据相干操作的一些等待时间与地址转换和数据高速缓存访​​问延迟重叠,从而进一步改善 记忆操作的表现。

    Using broadcast-based TLB sharing to reduce address-translation latency in a shared-memory system with electrical interconnect
    3.
    发明授权
    Using broadcast-based TLB sharing to reduce address-translation latency in a shared-memory system with electrical interconnect 有权
    使用基于广播的TLB共享来减少具有电互连的共享存储器系统中的地址转换延迟

    公开(公告)号:US09009446B2

    公开(公告)日:2015-04-14

    申请号:US13565460

    申请日:2012-08-02

    IPC分类号: G06F12/00 G06F12/10

    摘要: The disclosed embodiments provide a system that uses broadcast-based TLB-sharing techniques to reduce address-translation latency in a shared-memory multiprocessor system with two or more nodes that are connected by an electrical interconnect. During operation, a first node receives a memory operation that includes a virtual address. Upon determining that one or more TLB levels of the first node will miss for the virtual address, the first node uses the electrical interconnect to broadcast a TLB request to one or more additional nodes of the shared-memory multiprocessor in parallel with scheduling a speculative page-table walk for the virtual address. If the first node receives a TLB entry from another node of the shared-memory multiprocessor via the electrical interconnect in response to the TLB request, the first node cancels the speculative page-table walk. Otherwise, if no response is received, the first node instead waits for the completion of the page-table walk.

    摘要翻译: 所公开的实施例提供了一种使用基于广播的TLB共享技术来减少具有通过电互连连接的两个或更多个节点的共享存储器多处理器系统中的地址转换等待时间的系统。 在操作期间,第一节点接收包括虚拟地址的存储器操作。 在确定第一节点的一个或多个TLB级别将为虚拟地址而错过时,第一节点使用电互连向共享存储器多处理器的一个或多个附加节点广播TLB请求,并且与调度推测页面 -table walk为虚拟地址。 如果第一节点响应于TLB请求经由电互连从另一节点接收到共享存储器多处理器的TLB条目,则第一节点取消推测页表行进。 否则,如果没有收到响应,则第一个节点等待完成页表步行。

    USING BROADCAST-BASED TLB SHARING TO REDUCE ADDRESS-TRANSLATION LATENCY IN A SHARED-MEMORY SYSTEM WITH ELECTRICAL INTERCONNECT
    4.
    发明申请
    USING BROADCAST-BASED TLB SHARING TO REDUCE ADDRESS-TRANSLATION LATENCY IN A SHARED-MEMORY SYSTEM WITH ELECTRICAL INTERCONNECT 有权
    使用基于广播的TLB共享在具有电互连的共享记忆系统中减少地址转换延迟

    公开(公告)号:US20140040562A1

    公开(公告)日:2014-02-06

    申请号:US13565460

    申请日:2012-08-02

    IPC分类号: G06F12/08 G06F12/10

    摘要: The disclosed embodiments provide a system that uses broadcast-based TLB-sharing techniques to reduce address-translation latency in a shared-memory multiprocessor system with two or more nodes that are connected by an electrical interconnect. During operation, a first node receives a memory operation that includes a virtual address. Upon determining that one or more TLB levels of the first node will miss for the virtual address, the first node uses the electrical interconnect to broadcast a TLB request to one or more additional nodes of the shared-memory multiprocessor in parallel with scheduling a speculative page-table walk for the virtual address. If the first node receives a TLB entry from another node of the shared-memory multiprocessor via the electrical interconnect in response to the TLB request, the first node cancels the speculative page-table walk. Otherwise, if no response is received, the first node instead waits for the completion of the page-table walk.

    摘要翻译: 所公开的实施例提供了一种使用基于广播的TLB共享技术来减少具有通过电互连连接的两个或更多个节点的共享存储器多处理器系统中的地址转换等待时间的系统。 在操作期间,第一节点接收包括虚拟地址的存储器操作。 在确定第一节点的一个或多个TLB级别将为虚拟地址而错过时,第一节点使用电互连向共享存储器多处理器的一个或多个附加节点广播TLB请求,并且与调度推测页面 -table walk为虚拟地址。 如果第一节点响应于TLB请求经由电互连从另一节点接收到共享存储器多处理器的TLB条目,则第一节点取消推测页表行进。 否则,如果没有收到响应,则第一个节点等待完成页表步行。

    Using a shared last-level TLB to reduce address-translation latency
    5.
    发明授权
    Using a shared last-level TLB to reduce address-translation latency 有权
    使用共享的最后一级TLB来减少地址转换延迟

    公开(公告)号:US09081706B2

    公开(公告)日:2015-07-14

    申请号:US13468904

    申请日:2012-05-10

    IPC分类号: G06F3/03 G06F12/10 G06F12/08

    摘要: The disclosed embodiments provide techniques for reducing address-translation latency and the serialization latency of combined TLB and data cache misses in a coherent shared-memory system. For instance, the last-level TLB structures of two or more multiprocessor nodes can be configured to act together as either a distributed shared last-level TLB or a directory-based shared last-level TLB. Such TLB-sharing techniques increase the total amount of useful translations that are cached by the system, thereby reducing the number of page-table walks and improving performance. Furthermore, a coherent shared-memory system with a shared last-level TLB can be further configured to fuse TLB and cache misses such that some of the latency of data coherence operations is overlapped with address translation and data cache access latencies, thereby further improving the performance of memory operations.

    摘要翻译: 所公开的实施例提供了用于在一致的共享存储器系统中减少地址转换等待时间和组合TLB和数据高速缓存未命中的串行化延迟的技术。 例如,两个或多个多处理器节点的最后一级TLB结构可以配置为一起作为分布式共享的最后一级TLB或基于目录的共享的最后一级TLB。 这种TLB共享技术增加了系统缓存的有用的翻译的总量,从而减少了页表行进的数量并提高了性能。 此外,具有共享的最后一级TLB的一致的共享存储器系统可以被进一步配置为对TLB和高速缓存未命中进行融合,使得数据相干操作的一些等待时间与地址转换和数据高速缓存访​​问延迟重叠,从而进一步改善 记忆操作的表现。

    Using broadcast-based TLB sharing to reduce address-translation latency in a shared-memory system with optical interconnect
    6.
    发明授权
    Using broadcast-based TLB sharing to reduce address-translation latency in a shared-memory system with optical interconnect 有权
    使用基于广播的TLB共享来减少具有光互连的共享存储器系统中的地址转换延迟

    公开(公告)号:US09235529B2

    公开(公告)日:2016-01-12

    申请号:US13565476

    申请日:2012-08-02

    IPC分类号: G06F12/00 G06F12/10 H04Q11/00

    摘要: The disclosed embodiments provide a system that uses broadcast-based TLB sharing to reduce address-translation latency in a shared-memory multiprocessor system with two or more nodes that are connected by an optical interconnect. During operation, a first node receives a memory operation that includes a virtual address. Upon determining that one or more TLB levels of the first node will miss for the virtual address, the first node uses the optical interconnect to broadcast a TLB request to one or more additional nodes of the shared-memory multiprocessor in parallel with scheduling a speculative page-table walk for the virtual address. If the first node receives a TLB entry from another node of the shared-memory multiprocessor via the optical interconnect in response to the TLB request, the first node cancels the speculative page-table walk. Otherwise, if no response is received, the first node instead waits for the completion of the page-table walk.

    摘要翻译: 所公开的实施例提供一种使用基于广播的TLB共享来减少具有通过光互连连接的两个或更多个节点的共享存储器多处理器系统中的地址转换等待时间的系统。 在操作期间,第一节点接收包括虚拟地址的存储器操作。 在确定第一节点的一个或多个TLB级别将为虚拟地址而错过时,第一节点使用光互连来与调度推测页面并行地向共享存储器多处理器的一个或多个附加节点广播TLB请求 -table walk为虚拟地址。 如果第一节点响应于TLB请求经由光互连从第一节点从共享存储器多处理器的另一节点接收到TLB条目,则第一节点取消推测页表行进。 否则,如果没有收到响应,则第一个节点等待完成页表步行。

    USING BROADCAST-BASED TLB SHARING TO REDUCE ADDRESS-TRANSLATION LATENCY IN A SHARED-MEMORY SYSTEM WITH OPTICAL INTERCONNECT
    7.
    发明申请
    USING BROADCAST-BASED TLB SHARING TO REDUCE ADDRESS-TRANSLATION LATENCY IN A SHARED-MEMORY SYSTEM WITH OPTICAL INTERCONNECT 有权
    使用基于广播的TLB共享减少具有光互联的共享记忆系统中的地址转换延迟

    公开(公告)号:US20150301949A1

    公开(公告)日:2015-10-22

    申请号:US13565476

    申请日:2012-08-02

    IPC分类号: G06F12/10

    摘要: The disclosed embodiments provide a system that uses broadcast-based TLB sharing to reduce address-translation latency in a shared-memory multiprocessor system with two or more nodes that are connected by an optical interconnect. During operation, a first node receives a memory operation that includes a virtual address. Upon determining that one or more TLB levels of the first node will miss for the virtual address, the first node uses the optical interconnect to broadcast a TLB request to one or more additional nodes of the shared-memory multiprocessor in parallel with scheduling a speculative page-table walk for the virtual address. If the first node receives a TLB entry from another node of the shared-memory multiprocessor via the optical interconnect in response to the TLB request, the first node cancels the speculative page-table walk. Otherwise, if no response is received, the first node instead waits for the completion of the page-table walk.

    摘要翻译: 所公开的实施例提供一种使用基于广播的TLB共享来减少具有通过光互连连接的两个或更多个节点的共享存储器多处理器系统中的地址转换等待时间的系统。 在操作期间,第一节点接收包括虚拟地址的存储器操作。 在确定第一节点的一个或多个TLB级别将为虚拟地址而错过时,第一节点使用光互连来与调度推测页面并行地向共享存储器多处理器的一个或多个附加节点广播TLB请求 -table walk为虚拟地址。 如果第一节点响应于TLB请求经由光互连从第一节点从共享存储器多处理器的另一节点接收到TLB条目,则第一节点取消推测页表行进。 否则,如果没有收到响应,则第一个节点等待完成页表步行。

    USING A SHARED LAST-LEVEL TLB TO REDUCE ADDRESS-TRANSLATION LATENCY
    8.
    发明申请
    USING A SHARED LAST-LEVEL TLB TO REDUCE ADDRESS-TRANSLATION LATENCY 有权
    使用共享的最后一级TLB来减少地址转换延迟

    公开(公告)号:US20140052917A1

    公开(公告)日:2014-02-20

    申请号:US13468904

    申请日:2012-05-10

    IPC分类号: G06F12/10 G06F12/08

    摘要: The disclosed embodiments provide techniques for reducing address-translation latency and the serialization latency of combined TLB and data cache misses in a coherent shared-memory system. For instance, the last-level TLB structures of two or more multiprocessor nodes can be configured to act together as either a distributed shared last-level TLB or a directory-based shared last-level TLB. Such TLB-sharing techniques increase the total amount of useful translations that are cached by the system, thereby reducing the number of page-table walks and improving performance. Furthermore, a coherent shared-memory system with a shared last-level TLB can be further configured to fuse TLB and cache misses such that some of the latency of data coherence operations is overlapped with address translation and data cache access latencies, thereby further improving the performance of memory operations.

    摘要翻译: 所公开的实施例提供了用于在一致的共享存储器系统中减少地址转换等待时间和组合TLB和数据高速缓存未命中的串行化延迟的技术。 例如,两个或多个多处理器节点的最后一级TLB结构可以配置为一起作为分布式共享的最后一级TLB或基于目录的共享的最后一级TLB。 这种TLB共享技术增加了系统缓存的有用的翻译的总量,从而减少了页表行进的数量并提高了性能。 此外,具有共享的最后一级TLB的一致的共享存储器系统可以被进一步配置为对TLB和高速缓存未命中进行融合,使得数据相干操作的一些等待时间与地址转换和数据高速缓存访​​问延迟重叠,从而进一步改善 记忆操作的表现。