Using broadcast-based TLB sharing to reduce address-translation latency in a shared-memory system with electrical interconnect
    1.
    发明授权
    Using broadcast-based TLB sharing to reduce address-translation latency in a shared-memory system with electrical interconnect 有权
    使用基于广播的TLB共享来减少具有电互连的共享存储器系统中的地址转换延迟

    公开(公告)号:US09009446B2

    公开(公告)日:2015-04-14

    申请号:US13565460

    申请日:2012-08-02

    IPC分类号: G06F12/00 G06F12/10

    摘要: The disclosed embodiments provide a system that uses broadcast-based TLB-sharing techniques to reduce address-translation latency in a shared-memory multiprocessor system with two or more nodes that are connected by an electrical interconnect. During operation, a first node receives a memory operation that includes a virtual address. Upon determining that one or more TLB levels of the first node will miss for the virtual address, the first node uses the electrical interconnect to broadcast a TLB request to one or more additional nodes of the shared-memory multiprocessor in parallel with scheduling a speculative page-table walk for the virtual address. If the first node receives a TLB entry from another node of the shared-memory multiprocessor via the electrical interconnect in response to the TLB request, the first node cancels the speculative page-table walk. Otherwise, if no response is received, the first node instead waits for the completion of the page-table walk.

    摘要翻译: 所公开的实施例提供了一种使用基于广播的TLB共享技术来减少具有通过电互连连接的两个或更多个节点的共享存储器多处理器系统中的地址转换等待时间的系统。 在操作期间,第一节点接收包括虚拟地址的存储器操作。 在确定第一节点的一个或多个TLB级别将为虚拟地址而错过时,第一节点使用电互连向共享存储器多处理器的一个或多个附加节点广播TLB请求,并且与调度推测页面 -table walk为虚拟地址。 如果第一节点响应于TLB请求经由电互连从另一节点接收到共享存储器多处理器的TLB条目,则第一节点取消推测页表行进。 否则,如果没有收到响应,则第一个节点等待完成页表步行。

    USING BROADCAST-BASED TLB SHARING TO REDUCE ADDRESS-TRANSLATION LATENCY IN A SHARED-MEMORY SYSTEM WITH ELECTRICAL INTERCONNECT
    2.
    发明申请
    USING BROADCAST-BASED TLB SHARING TO REDUCE ADDRESS-TRANSLATION LATENCY IN A SHARED-MEMORY SYSTEM WITH ELECTRICAL INTERCONNECT 有权
    使用基于广播的TLB共享在具有电互连的共享记忆系统中减少地址转换延迟

    公开(公告)号:US20140040562A1

    公开(公告)日:2014-02-06

    申请号:US13565460

    申请日:2012-08-02

    IPC分类号: G06F12/08 G06F12/10

    摘要: The disclosed embodiments provide a system that uses broadcast-based TLB-sharing techniques to reduce address-translation latency in a shared-memory multiprocessor system with two or more nodes that are connected by an electrical interconnect. During operation, a first node receives a memory operation that includes a virtual address. Upon determining that one or more TLB levels of the first node will miss for the virtual address, the first node uses the electrical interconnect to broadcast a TLB request to one or more additional nodes of the shared-memory multiprocessor in parallel with scheduling a speculative page-table walk for the virtual address. If the first node receives a TLB entry from another node of the shared-memory multiprocessor via the electrical interconnect in response to the TLB request, the first node cancels the speculative page-table walk. Otherwise, if no response is received, the first node instead waits for the completion of the page-table walk.

    摘要翻译: 所公开的实施例提供了一种使用基于广播的TLB共享技术来减少具有通过电互连连接的两个或更多个节点的共享存储器多处理器系统中的地址转换等待时间的系统。 在操作期间,第一节点接收包括虚拟地址的存储器操作。 在确定第一节点的一个或多个TLB级别将为虚拟地址而错过时,第一节点使用电互连向共享存储器多处理器的一个或多个附加节点广播TLB请求,并且与调度推测页面 -table walk为虚拟地址。 如果第一节点响应于TLB请求经由电互连从另一节点接收到共享存储器多处理器的TLB条目,则第一节点取消推测页表行进。 否则,如果没有收到响应,则第一个节点等待完成页表步行。

    Using a shared last-level TLB to reduce address-translation latency
    3.
    发明授权
    Using a shared last-level TLB to reduce address-translation latency 有权
    使用共享的最后一级TLB来减少地址转换延迟

    公开(公告)号:US09081706B2

    公开(公告)日:2015-07-14

    申请号:US13468904

    申请日:2012-05-10

    IPC分类号: G06F3/03 G06F12/10 G06F12/08

    摘要: The disclosed embodiments provide techniques for reducing address-translation latency and the serialization latency of combined TLB and data cache misses in a coherent shared-memory system. For instance, the last-level TLB structures of two or more multiprocessor nodes can be configured to act together as either a distributed shared last-level TLB or a directory-based shared last-level TLB. Such TLB-sharing techniques increase the total amount of useful translations that are cached by the system, thereby reducing the number of page-table walks and improving performance. Furthermore, a coherent shared-memory system with a shared last-level TLB can be further configured to fuse TLB and cache misses such that some of the latency of data coherence operations is overlapped with address translation and data cache access latencies, thereby further improving the performance of memory operations.

    摘要翻译: 所公开的实施例提供了用于在一致的共享存储器系统中减少地址转换等待时间和组合TLB和数据高速缓存未命中的串行化延迟的技术。 例如,两个或多个多处理器节点的最后一级TLB结构可以配置为一起作为分布式共享的最后一级TLB或基于目录的共享的最后一级TLB。 这种TLB共享技术增加了系统缓存的有用的翻译的总量,从而减少了页表行进的数量并提高了性能。 此外,具有共享的最后一级TLB的一致的共享存储器系统可以被进一步配置为对TLB和高速缓存未命中进行融合,使得数据相干操作的一些等待时间与地址转换和数据高速缓存访​​问延迟重叠,从而进一步改善 记忆操作的表现。

    Using broadcast-based TLB sharing to reduce address-translation latency in a shared-memory system with optical interconnect
    4.
    发明授权
    Using broadcast-based TLB sharing to reduce address-translation latency in a shared-memory system with optical interconnect 有权
    使用基于广播的TLB共享来减少具有光互连的共享存储器系统中的地址转换延迟

    公开(公告)号:US09235529B2

    公开(公告)日:2016-01-12

    申请号:US13565476

    申请日:2012-08-02

    IPC分类号: G06F12/00 G06F12/10 H04Q11/00

    摘要: The disclosed embodiments provide a system that uses broadcast-based TLB sharing to reduce address-translation latency in a shared-memory multiprocessor system with two or more nodes that are connected by an optical interconnect. During operation, a first node receives a memory operation that includes a virtual address. Upon determining that one or more TLB levels of the first node will miss for the virtual address, the first node uses the optical interconnect to broadcast a TLB request to one or more additional nodes of the shared-memory multiprocessor in parallel with scheduling a speculative page-table walk for the virtual address. If the first node receives a TLB entry from another node of the shared-memory multiprocessor via the optical interconnect in response to the TLB request, the first node cancels the speculative page-table walk. Otherwise, if no response is received, the first node instead waits for the completion of the page-table walk.

    摘要翻译: 所公开的实施例提供一种使用基于广播的TLB共享来减少具有通过光互连连接的两个或更多个节点的共享存储器多处理器系统中的地址转换等待时间的系统。 在操作期间,第一节点接收包括虚拟地址的存储器操作。 在确定第一节点的一个或多个TLB级别将为虚拟地址而错过时,第一节点使用光互连来与调度推测页面并行地向共享存储器多处理器的一个或多个附加节点广播TLB请求 -table walk为虚拟地址。 如果第一节点响应于TLB请求经由光互连从第一节点从共享存储器多处理器的另一节点接收到TLB条目,则第一节点取消推测页表行进。 否则,如果没有收到响应,则第一个节点等待完成页表步行。

    USING BROADCAST-BASED TLB SHARING TO REDUCE ADDRESS-TRANSLATION LATENCY IN A SHARED-MEMORY SYSTEM WITH OPTICAL INTERCONNECT
    5.
    发明申请
    USING BROADCAST-BASED TLB SHARING TO REDUCE ADDRESS-TRANSLATION LATENCY IN A SHARED-MEMORY SYSTEM WITH OPTICAL INTERCONNECT 有权
    使用基于广播的TLB共享减少具有光互联的共享记忆系统中的地址转换延迟

    公开(公告)号:US20150301949A1

    公开(公告)日:2015-10-22

    申请号:US13565476

    申请日:2012-08-02

    IPC分类号: G06F12/10

    摘要: The disclosed embodiments provide a system that uses broadcast-based TLB sharing to reduce address-translation latency in a shared-memory multiprocessor system with two or more nodes that are connected by an optical interconnect. During operation, a first node receives a memory operation that includes a virtual address. Upon determining that one or more TLB levels of the first node will miss for the virtual address, the first node uses the optical interconnect to broadcast a TLB request to one or more additional nodes of the shared-memory multiprocessor in parallel with scheduling a speculative page-table walk for the virtual address. If the first node receives a TLB entry from another node of the shared-memory multiprocessor via the optical interconnect in response to the TLB request, the first node cancels the speculative page-table walk. Otherwise, if no response is received, the first node instead waits for the completion of the page-table walk.

    摘要翻译: 所公开的实施例提供一种使用基于广播的TLB共享来减少具有通过光互连连接的两个或更多个节点的共享存储器多处理器系统中的地址转换等待时间的系统。 在操作期间,第一节点接收包括虚拟地址的存储器操作。 在确定第一节点的一个或多个TLB级别将为虚拟地址而错过时,第一节点使用光互连来与调度推测页面并行地向共享存储器多处理器的一个或多个附加节点广播TLB请求 -table walk为虚拟地址。 如果第一节点响应于TLB请求经由光互连从第一节点从共享存储器多处理器的另一节点接收到TLB条目,则第一节点取消推测页表行进。 否则,如果没有收到响应,则第一个节点等待完成页表步行。

    COMBINING A REMOTE TLB LOOKUP AND A SUBSEQUENT CACHE MISS INTO A SINGLE COHERENCE OPERATION
    6.
    发明申请
    COMBINING A REMOTE TLB LOOKUP AND A SUBSEQUENT CACHE MISS INTO A SINGLE COHERENCE OPERATION 有权
    组合远程TLB查询和后续的高速缓存进入单一的相关操作

    公开(公告)号:US20140013074A1

    公开(公告)日:2014-01-09

    申请号:US13494843

    申请日:2012-06-12

    IPC分类号: G06F12/10

    摘要: The disclosed embodiments provide techniques for reducing address-translation latency and the serialization latency of combined TLB and data cache misses in a coherent shared-memory system. For instance, the last-level TLB structures of two or more multiprocessor nodes can be configured to act together as either a distributed shared last-level TLB or a directory-based shared last-level TLB. Such TLB-sharing techniques increase the total amount of useful translations that are cached by the system, thereby reducing the number of page-table walks and improving performance. Furthermore, a coherent shared-memory system with a shared last-level TLB can be further configured to fuse TLB and cache misses such that some of the latency of data coherence operations is overlapped with address translation and data cache access latencies, thereby further improving the performance of memory operations.

    摘要翻译: 所公开的实施例提供了用于在一致的共享存储器系统中减少地址转换等待时间和组合TLB和数据高速缓存未命中的串行化延迟的技术。 例如,两个或多个多处理器节点的最后一级TLB结构可以配置为一起作为分布式共享的最后一级TLB或基于目录的共享的最后一级TLB。 这种TLB共享技术增加了系统缓存的有用的翻译的总量,从而减少了页表行进的数量并提高了性能。 此外,具有共享的最后一级TLB的一致的共享存储器系统可以被进一步配置为对TLB和高速缓存未命中进行融合,使得数据相干操作的一些等待时间与地址转换和数据高速缓存访​​问延迟重叠,从而进一步改善 记忆操作的表现。

    Combining a remote TLB lookup and a subsequent cache miss into a single coherence operation
    7.
    发明授权
    Combining a remote TLB lookup and a subsequent cache miss into a single coherence operation 有权
    将远程TLB查找和后续高速缓存未命中组合到单个相干操作中

    公开(公告)号:US09003163B2

    公开(公告)日:2015-04-07

    申请号:US13494843

    申请日:2012-06-12

    IPC分类号: G06F12/10 G06F12/08

    摘要: The disclosed embodiments provide techniques for reducing address-translation latency and the serialization latency of combined TLB and data cache misses in a coherent shared-memory system. For instance, the last-level TLB structures of two or more multiprocessor nodes can be configured to act together as either a distributed shared last-level TLB or a directory-based shared last-level TLB. Such TLB-sharing techniques increase the total amount of useful translations that are cached by the system, thereby reducing the number of page-table walks and improving performance. Furthermore, a coherent shared-memory system with a shared last-level TLB can be further configured to fuse TLB and cache misses such that some of the latency of data coherence operations is overlapped with address translation and data cache access latencies, thereby further improving the performance of memory operations.

    摘要翻译: 所公开的实施例提供了用于在一致的共享存储器系统中减少地址转换等待时间和组合TLB和数据高速缓存未命中的串行化延迟的技术。 例如,两个或多个多处理器节点的最后一级TLB结构可以配置为一起作为分布式共享的最后一级TLB或基于目录的共享的最后一级TLB。 这种TLB共享技术增加了系统缓存的有用的翻译的总量,从而减少了页表行进的数量并提高了性能。 此外,具有共享的最后一级TLB的一致的共享存储器系统可以被进一步配置为对TLB和高速缓存未命中进行融合,使得数据相干操作的一些等待时间与地址转换和数据高速缓存访​​问延迟重叠,从而进一步改善 记忆操作的表现。

    USING A SHARED LAST-LEVEL TLB TO REDUCE ADDRESS-TRANSLATION LATENCY
    8.
    发明申请
    USING A SHARED LAST-LEVEL TLB TO REDUCE ADDRESS-TRANSLATION LATENCY 有权
    使用共享的最后一级TLB来减少地址转换延迟

    公开(公告)号:US20140052917A1

    公开(公告)日:2014-02-20

    申请号:US13468904

    申请日:2012-05-10

    IPC分类号: G06F12/10 G06F12/08

    摘要: The disclosed embodiments provide techniques for reducing address-translation latency and the serialization latency of combined TLB and data cache misses in a coherent shared-memory system. For instance, the last-level TLB structures of two or more multiprocessor nodes can be configured to act together as either a distributed shared last-level TLB or a directory-based shared last-level TLB. Such TLB-sharing techniques increase the total amount of useful translations that are cached by the system, thereby reducing the number of page-table walks and improving performance. Furthermore, a coherent shared-memory system with a shared last-level TLB can be further configured to fuse TLB and cache misses such that some of the latency of data coherence operations is overlapped with address translation and data cache access latencies, thereby further improving the performance of memory operations.

    摘要翻译: 所公开的实施例提供了用于在一致的共享存储器系统中减少地址转换等待时间和组合TLB和数据高速缓存未命中的串行化延迟的技术。 例如,两个或多个多处理器节点的最后一级TLB结构可以配置为一起作为分布式共享的最后一级TLB或基于目录的共享的最后一级TLB。 这种TLB共享技术增加了系统缓存的有用的翻译的总量,从而减少了页表行进的数量并提高了性能。 此外,具有共享的最后一级TLB的一致的共享存储器系统可以被进一步配置为对TLB和高速缓存未命中进行融合,使得数据相干操作的一些等待时间与地址转换和数据高速缓存访​​问延迟重叠,从而进一步改善 记忆操作的表现。

    Arbitrated optical network using tunable drop filters
    9.
    发明授权
    Arbitrated optical network using tunable drop filters 有权
    使用可调放大滤波器的仲裁光网络

    公开(公告)号:US08655120B2

    公开(公告)日:2014-02-18

    申请号:US13180364

    申请日:2011-07-11

    IPC分类号: G02B6/12

    摘要: In a multi-chip module (MCM), integrated circuits are coupled by optical waveguides. These integrated circuits receive optical signals from a set of light sources which have fixed carrier wavelengths. Moreover, a given integrated circuit includes: a transmitter that modulates at least one of the optical signals when transmitting information to at least another of the integrated circuits; and a receiver that receives at least one modulated optical signal having one of the carrier wavelengths when receiving information from at least the other of the integrated circuits. Furthermore, the MCM includes tunable drop filters optically coupled to the optical waveguides and associated integrated circuits, wherein the tunable drop filters pass adjustable bands of wavelengths to receivers in the integrated circuits. Additionally, control logic in the MCM provides a control signal to the tunable drop filters to specify the adjustable bands of wavelengths.

    摘要翻译: 在多芯片模块(MCM)中,集成电路通过光波导耦合。 这些集成电路从具有固定载波波长的一组光源接收光信号。 此外,给定的集成电路包括:当向至少另一个集成电路传送信息时调制至少一个光信号的发射机; 以及接收器,当从至少另一个集成电路接收信息时,接收具有载波波长之一的至少一个调制光信号。 此外,MCM包括光耦合到光波导和相关联的集成电路的可调滴式滤波器,其中可调滴式滤波器将可调节的波段传送到集成电路中的接收器。 另外,MCM中的控制逻辑为可调降滤波器提供了一个控制信号,以指定波长的可调波段。

    OPTICAL NETWORK WITH TUNABLE OPTICAL LIGHT SOURCES
    10.
    发明申请
    OPTICAL NETWORK WITH TUNABLE OPTICAL LIGHT SOURCES 有权
    具有光学光源的光学网络

    公开(公告)号:US20130016980A1

    公开(公告)日:2013-01-17

    申请号:US13180340

    申请日:2011-07-11

    IPC分类号: H04B10/12 H04B10/02

    摘要: In a multi-chip module (MCM), integrated circuits are coupled by optical waveguides. These integrated circuits receive optical signals from a set of tunable light sources. Moreover, a given integrated circuit includes: a transmitter that modulates at least one of the optical signals when transmitting information to at least another of the integrated circuits; and a receiver that receives at least one modulated optical signal having a given carrier wavelength associated with the given integrated circuit when receiving information from at least the other of the integrated circuits. Furthermore, control logic in the MCM provides a control signal to the set of tunable light sources to specify carrier wavelengths in the optical signals output by the set of tunable light sources, thereby defining routing of at least the one of the optical signals in the MCM during communication between at least a pair of the integrated circuits.

    摘要翻译: 在多芯片模块(MCM)中,集成电路通过光波导耦合。 这些集成电路从一组可调光源接收光信号。 此外,给定的集成电路包括:发送器,当向至少另一个集成电路发送信息时,调制至少一个光信号; 以及接收器,当从至少另一个集成电路接收信息时,接收具有与给定集成电路相关联的给定载波波长的至少一个调制光信号。 此外,MCM中的控制逻辑向可调谐光源组提供控制信号,以指定由可调谐光源组输出的光信号中的载波波长,从而定义MCM中的至少一个光信号的路由 在至少一对集成电路之间的通信期间。