Hybrid cache coherence using fine-grained hardware message passing
    31.
    发明授权
    Hybrid cache coherence using fine-grained hardware message passing 有权
    混合高速缓存一致性使用细粒度的硬件消息传递

    公开(公告)号:US07895400B2

    公开(公告)日:2011-02-22

    申请号:US11864507

    申请日:2007-09-28

    IPC分类号: G06F12/08

    摘要: Multiprocessor systems conducting operations utilizing global shared memory must ensure that the memory is coherent. A hybrid system that combines hardware memory transactions with that of direct messaging provides memory coherence with minimal overhead requirement or bandwidth demands. Memory access transactions are intercepted and converted to direct messages which are then communicated to a target and/or remote node. Thereafter the message invokes a software handler which implements the cache coherence protocol. The handler uses additional messages to invalidate or fetch data in other caches, as well as to return data to the requesting processor. These additional messages are converted to appropriate hardware transactions by the destination system interface hardware.

    摘要翻译: 使用全局共享存储器进行操作的多处理器系统必须确保存储器是一致的。 将硬件存储器事务与直接消息传递相结合的混合系统提供了与最少占用需求或带宽需求的内存一致性。 内存访问事务被拦截并转换为直接消息,然后传送到目标和/或远程节点。 此后,该消息调用实现高速缓存一致性协议的软件处理程序。 该处理程序使用附加消息使其他缓存中的数据无效或获取,并将数据返回到请求处理器。 目标系统接口硬件将这些附加消息转换为适当的硬件事务。

    Hardware data race detection in HPCS codes
    32.
    发明授权
    Hardware data race detection in HPCS codes 有权
    HPCS代码中的硬件数据竞争检测

    公开(公告)号:US07823013B1

    公开(公告)日:2010-10-26

    申请号:US11685555

    申请日:2007-03-13

    IPC分类号: G06F11/00

    摘要: A method and system for detecting race conditions computing systems. A parallel computing system includes multiple processor cores is coupled to memory. An application with a code sequence in which parallelism to be exploited is executed on this system. Different processor cores may operate on a given memory line concurrently. Extra bits are associated with the memory data line and are used to indicate changes to corresponding subsections of data in the memory line. A memory controller may perform a comparison between check bits of a memory line to determine if more than one processor core modified the same section of data in a cache line and a race condition has occurred.

    摘要翻译: 一种用于检测竞争条件计算系统的方法和系统。 并行计算系统包括多个处理器核心耦合到存储器。 具有代码序列的应用程序将在该系统上执行要利用的并行性。 不同的处理器核心可以同时在给定的存储器线路上操作。 额外的位与存储器数据线相关联,并且用于指示对存储器线中数据的相应子部分的改变。 存储器控制器可以执行存储器线的校验位之间的比较,以确定是否有多于一个处理器核心修改了高速缓存行中的相同的数据段并且发生了竞争条件。

    Using broadcast-based TLB sharing to reduce address-translation latency in a shared-memory system with optical interconnect
    33.
    发明授权
    Using broadcast-based TLB sharing to reduce address-translation latency in a shared-memory system with optical interconnect 有权
    使用基于广播的TLB共享来减少具有光互连的共享存储器系统中的地址转换延迟

    公开(公告)号:US09235529B2

    公开(公告)日:2016-01-12

    申请号:US13565476

    申请日:2012-08-02

    IPC分类号: G06F12/00 G06F12/10 H04Q11/00

    摘要: The disclosed embodiments provide a system that uses broadcast-based TLB sharing to reduce address-translation latency in a shared-memory multiprocessor system with two or more nodes that are connected by an optical interconnect. During operation, a first node receives a memory operation that includes a virtual address. Upon determining that one or more TLB levels of the first node will miss for the virtual address, the first node uses the optical interconnect to broadcast a TLB request to one or more additional nodes of the shared-memory multiprocessor in parallel with scheduling a speculative page-table walk for the virtual address. If the first node receives a TLB entry from another node of the shared-memory multiprocessor via the optical interconnect in response to the TLB request, the first node cancels the speculative page-table walk. Otherwise, if no response is received, the first node instead waits for the completion of the page-table walk.

    摘要翻译: 所公开的实施例提供一种使用基于广播的TLB共享来减少具有通过光互连连接的两个或更多个节点的共享存储器多处理器系统中的地址转换等待时间的系统。 在操作期间,第一节点接收包括虚拟地址的存储器操作。 在确定第一节点的一个或多个TLB级别将为虚拟地址而错过时,第一节点使用光互连来与调度推测页面并行地向共享存储器多处理器的一个或多个附加节点广播TLB请求 -table walk为虚拟地址。 如果第一节点响应于TLB请求经由光互连从第一节点从共享存储器多处理器的另一节点接收到TLB条目,则第一节点取消推测页表行进。 否则,如果没有收到响应,则第一个节点等待完成页表步行。

    USING BROADCAST-BASED TLB SHARING TO REDUCE ADDRESS-TRANSLATION LATENCY IN A SHARED-MEMORY SYSTEM WITH OPTICAL INTERCONNECT
    34.
    发明申请
    USING BROADCAST-BASED TLB SHARING TO REDUCE ADDRESS-TRANSLATION LATENCY IN A SHARED-MEMORY SYSTEM WITH OPTICAL INTERCONNECT 有权
    使用基于广播的TLB共享减少具有光互联的共享记忆系统中的地址转换延迟

    公开(公告)号:US20150301949A1

    公开(公告)日:2015-10-22

    申请号:US13565476

    申请日:2012-08-02

    IPC分类号: G06F12/10

    摘要: The disclosed embodiments provide a system that uses broadcast-based TLB sharing to reduce address-translation latency in a shared-memory multiprocessor system with two or more nodes that are connected by an optical interconnect. During operation, a first node receives a memory operation that includes a virtual address. Upon determining that one or more TLB levels of the first node will miss for the virtual address, the first node uses the optical interconnect to broadcast a TLB request to one or more additional nodes of the shared-memory multiprocessor in parallel with scheduling a speculative page-table walk for the virtual address. If the first node receives a TLB entry from another node of the shared-memory multiprocessor via the optical interconnect in response to the TLB request, the first node cancels the speculative page-table walk. Otherwise, if no response is received, the first node instead waits for the completion of the page-table walk.

    摘要翻译: 所公开的实施例提供一种使用基于广播的TLB共享来减少具有通过光互连连接的两个或更多个节点的共享存储器多处理器系统中的地址转换等待时间的系统。 在操作期间,第一节点接收包括虚拟地址的存储器操作。 在确定第一节点的一个或多个TLB级别将为虚拟地址而错过时,第一节点使用光互连来与调度推测页面并行地向共享存储器多处理器的一个或多个附加节点广播TLB请求 -table walk为虚拟地址。 如果第一节点响应于TLB请求经由光互连从第一节点从共享存储器多处理器的另一节点接收到TLB条目,则第一节点取消推测页表行进。 否则,如果没有收到响应,则第一个节点等待完成页表步行。

    COMBINING A REMOTE TLB LOOKUP AND A SUBSEQUENT CACHE MISS INTO A SINGLE COHERENCE OPERATION
    35.
    发明申请
    COMBINING A REMOTE TLB LOOKUP AND A SUBSEQUENT CACHE MISS INTO A SINGLE COHERENCE OPERATION 有权
    组合远程TLB查询和后续的高速缓存进入单一的相关操作

    公开(公告)号:US20140013074A1

    公开(公告)日:2014-01-09

    申请号:US13494843

    申请日:2012-06-12

    IPC分类号: G06F12/10

    摘要: The disclosed embodiments provide techniques for reducing address-translation latency and the serialization latency of combined TLB and data cache misses in a coherent shared-memory system. For instance, the last-level TLB structures of two or more multiprocessor nodes can be configured to act together as either a distributed shared last-level TLB or a directory-based shared last-level TLB. Such TLB-sharing techniques increase the total amount of useful translations that are cached by the system, thereby reducing the number of page-table walks and improving performance. Furthermore, a coherent shared-memory system with a shared last-level TLB can be further configured to fuse TLB and cache misses such that some of the latency of data coherence operations is overlapped with address translation and data cache access latencies, thereby further improving the performance of memory operations.

    摘要翻译: 所公开的实施例提供了用于在一致的共享存储器系统中减少地址转换等待时间和组合TLB和数据高速缓存未命中的串行化延迟的技术。 例如,两个或多个多处理器节点的最后一级TLB结构可以配置为一起作为分布式共享的最后一级TLB或基于目录的共享的最后一级TLB。 这种TLB共享技术增加了系统缓存的有用的翻译的总量,从而减少了页表行进的数量并提高了性能。 此外,具有共享的最后一级TLB的一致的共享存储器系统可以被进一步配置为对TLB和高速缓存未命中进行融合,使得数据相干操作的一些等待时间与地址转换和数据高速缓存访​​问延迟重叠,从而进一步改善 记忆操作的表现。

    Reduction of cache flush time using a dirty line limiter
    36.
    发明授权
    Reduction of cache flush time using a dirty line limiter 有权
    使用脏线限幅器减少缓存刷新时间

    公开(公告)号:US08180968B2

    公开(公告)日:2012-05-15

    申请号:US11729527

    申请日:2007-03-28

    IPC分类号: G06F12/00

    CPC分类号: G06F12/128 G06F12/126

    摘要: The invention relates to a method for reducing cache flush time of a cache in a computer system. The method includes populating at least one of a plurality of directory entries of a dirty line directory based on modification of the cache to form at least one populated directory entry, and de-populating a pre-determined number of the plurality of directory entries according to a dirty line limiter protocol causing a write-back from the cache to a main memory, where the dirty line limiter protocol is based on a number of the at least one populated directory entry exceeding a pre-defined limit.

    摘要翻译: 本发明涉及一种用于减少计算机系统中的高速缓存的高速缓冲存储器清空时间的方法。 该方法包括基于高速缓存的修改来填充脏线路目录的多个目录条目中的至少一个,以形成至少一个填充的目录条目,以及根据所述多个目录条目的预定数目解除 一个脏线限制器协议,其导致从缓存到主存储器的回写,其中脏线限制器协议基于至少一个填充目录条目的数量超过预定义的限制。

    Combining a remote TLB lookup and a subsequent cache miss into a single coherence operation
    37.
    发明授权
    Combining a remote TLB lookup and a subsequent cache miss into a single coherence operation 有权
    将远程TLB查找和后续高速缓存未命中组合到单个相干操作中

    公开(公告)号:US09003163B2

    公开(公告)日:2015-04-07

    申请号:US13494843

    申请日:2012-06-12

    IPC分类号: G06F12/10 G06F12/08

    摘要: The disclosed embodiments provide techniques for reducing address-translation latency and the serialization latency of combined TLB and data cache misses in a coherent shared-memory system. For instance, the last-level TLB structures of two or more multiprocessor nodes can be configured to act together as either a distributed shared last-level TLB or a directory-based shared last-level TLB. Such TLB-sharing techniques increase the total amount of useful translations that are cached by the system, thereby reducing the number of page-table walks and improving performance. Furthermore, a coherent shared-memory system with a shared last-level TLB can be further configured to fuse TLB and cache misses such that some of the latency of data coherence operations is overlapped with address translation and data cache access latencies, thereby further improving the performance of memory operations.

    摘要翻译: 所公开的实施例提供了用于在一致的共享存储器系统中减少地址转换等待时间和组合TLB和数据高速缓存未命中的串行化延迟的技术。 例如,两个或多个多处理器节点的最后一级TLB结构可以配置为一起作为分布式共享的最后一级TLB或基于目录的共享的最后一级TLB。 这种TLB共享技术增加了系统缓存的有用的翻译的总量,从而减少了页表行进的数量并提高了性能。 此外,具有共享的最后一级TLB的一致的共享存储器系统可以被进一步配置为对TLB和高速缓存未命中进行融合,使得数据相干操作的一些等待时间与地址转换和数据高速缓存访​​问延迟重叠,从而进一步改善 记忆操作的表现。

    USING A SHARED LAST-LEVEL TLB TO REDUCE ADDRESS-TRANSLATION LATENCY
    38.
    发明申请
    USING A SHARED LAST-LEVEL TLB TO REDUCE ADDRESS-TRANSLATION LATENCY 有权
    使用共享的最后一级TLB来减少地址转换延迟

    公开(公告)号:US20140052917A1

    公开(公告)日:2014-02-20

    申请号:US13468904

    申请日:2012-05-10

    IPC分类号: G06F12/10 G06F12/08

    摘要: The disclosed embodiments provide techniques for reducing address-translation latency and the serialization latency of combined TLB and data cache misses in a coherent shared-memory system. For instance, the last-level TLB structures of two or more multiprocessor nodes can be configured to act together as either a distributed shared last-level TLB or a directory-based shared last-level TLB. Such TLB-sharing techniques increase the total amount of useful translations that are cached by the system, thereby reducing the number of page-table walks and improving performance. Furthermore, a coherent shared-memory system with a shared last-level TLB can be further configured to fuse TLB and cache misses such that some of the latency of data coherence operations is overlapped with address translation and data cache access latencies, thereby further improving the performance of memory operations.

    摘要翻译: 所公开的实施例提供了用于在一致的共享存储器系统中减少地址转换等待时间和组合TLB和数据高速缓存未命中的串行化延迟的技术。 例如,两个或多个多处理器节点的最后一级TLB结构可以配置为一起作为分布式共享的最后一级TLB或基于目录的共享的最后一级TLB。 这种TLB共享技术增加了系统缓存的有用的翻译的总量,从而减少了页表行进的数量并提高了性能。 此外,具有共享的最后一级TLB的一致的共享存储器系统可以被进一步配置为对TLB和高速缓存未命中进行融合,使得数据相干操作的一些等待时间与地址转换和数据高速缓存访​​问延迟重叠,从而进一步改善 记忆操作的表现。

    Optical network with tunable optical light sources
    39.
    发明授权
    Optical network with tunable optical light sources 有权
    具有可调谐光源的光网络

    公开(公告)号:US08606113B2

    公开(公告)日:2013-12-10

    申请号:US13180340

    申请日:2011-07-11

    IPC分类号: H04B10/00 H04B10/80

    摘要: In a multi-chip module (MCM), integrated circuits are coupled by optical waveguides. These integrated circuits receive optical signals from a set of tunable light sources. Moreover, a given integrated circuit includes: a transmitter that modulates at least one of the optical signals when transmitting information to at least another of the integrated circuits; and a receiver that receives at least one modulated optical signal having a given carrier wavelength associated with the given integrated circuit when receiving information from at least the other of the integrated circuits. Furthermore, control logic in the MCM provides a control signal to the set of tunable light sources to specify carrier wavelengths in the optical signals output by the set of tunable light sources, thereby defining routing of at least the one of the optical signals in the MCM during communication between at least a pair of the integrated circuits.

    摘要翻译: 在多芯片模块(MCM)中,集成电路通过光波导耦合。 这些集成电路从一组可调光源接收光信号。 此外,给定的集成电路包括:发送器,当向至少另一个集成电路发送信息时,调制至少一个光信号; 以及接收器,当从至少另一个集成电路接收信息时,接收具有与给定集成电路相关联的给定载波波长的至少一个调制光信号。 此外,MCM中的控制逻辑向可调谐光源组提供控制信号,以指定由可调谐光源组输出的光信号中的载波波长,从而定义MCM中的至少一个光信号的路由 在至少一对集成电路之间的通信期间。

    Optical network with switchable drop filters
    40.
    发明授权
    Optical network with switchable drop filters 有权
    具有可切换下降滤波器的光网络

    公开(公告)号:US08565608B2

    公开(公告)日:2013-10-22

    申请号:US13180355

    申请日:2011-07-11

    IPC分类号: H04B10/00 H04J14/02 G02B6/42

    摘要: In a multi-chip module (MCM), integrated circuits are coupled by optical waveguides. These integrated circuits receive optical signals from a set of light sources which have fixed carrier wavelengths. Moreover, a given integrated circuit includes: a transmitter that modulates at least one of the optical signals when transmitting information to at least another of the integrated circuits; and a receiver that receives at least one modulated optical signal having one of the carrier wavelengths when receiving information from at least the other of the integrated circuits. Furthermore, the MCM includes switchable drop filters optically coupled to the optical waveguides and associated integrated circuits, wherein the switchable drop filters pass adjustable bands of wavelengths to receivers in the integrated circuits. Additionally, control logic in the MCM provides a control signal to the switchable drop filters to specify the adjustable bands of wavelengths.

    摘要翻译: 在多芯片模块(MCM)中,集成电路通过光波导耦合。 这些集成电路从具有固定载波波长的一组光源接收光信号。 此外,给定的集成电路包括:发送器,当向至少另一个集成电路发送信息时,调制至少一个光信号; 以及接收器,当从至少另一个集成电路接收信息时,接收具有载波波长之一的至少一个调制光信号。 此外,MCM包括光耦合到光波导和相关联的集成电路的可切换液滴滤光器,其中可切换液滴滤光片将可调节的波段传送到集成电路中的接收器。 此外,MCM中的控制逻辑为可切换的降滤波器提供控制信号,以指定波长的可调波段。