Method and apparatus for detecting a cache wrap condition
    1.
    发明申请
    Method and apparatus for detecting a cache wrap condition 有权
    用于检测缓存包装条件的方法和装置

    公开(公告)号:US20060230239A1

    公开(公告)日:2006-10-12

    申请号:US11093132

    申请日:2005-03-29

    IPC分类号: G06F13/28

    CPC分类号: G06F12/0822 G06F12/0831

    摘要: A method and apparatus for detecting a cache wrap condition in a computing environment having a processor and a cache. A cache wrap condition is detected when the entire contents of a cache have been replaced, relative to a particular starting state. A set-associative cache is considered to have wrapped when all of the sets within the cache have been replaced. The starting point for cache wrap detection is the state of the cache sets at the time of the previous cache wrap. The method and apparatus is preferably implemented in a snoop filter having filter mechanisms that rely upon detecting the cache wrap condition. These snoop filter mechanisms requiring this information are operatively coupled with cache wrap detection logic adapted to detect the cache wrap event, and perform an indication step to the snoop filter mechanisms. In the various embodiments, cache wrap detection logic is implemented using registers and comparators, loadable counters, or a scoreboard data structure.

    摘要翻译: 一种用于在具有处理器和高速缓存的计算环境中检测高速缓存包装条件的方法和装置。 当高速缓存的全部内容相对于特定的启动状态被替换时,检测到缓存包装条件。 当缓存中的所有集合已被替换时,集合关联缓存被认为已被包装。 高速缓存包检测的起始点是先前高速缓存包装时高速缓存集的状态。 该方法和装置优选地在具有依赖于检测高速缓存包装条件的过滤机构的窥探过滤器中实现。 这些需要该信息的窥探过滤机构可操作地与适用于检测高速缓存包裹事件的高速缓存包检测逻辑耦合,并且向窥探过滤机构执行指示步骤。 在各种实施例中,使用寄存器和比较器,可加载计数器或记分板数据结构来实现高速缓存封包检测逻辑。

    Novel snoop filter for filtering snoop requests
    3.
    发明申请
    Novel snoop filter for filtering snoop requests 有权
    用于过滤窥探请求的新型窥探过滤器

    公开(公告)号:US20060224838A1

    公开(公告)日:2006-10-05

    申请号:US11093152

    申请日:2005-03-29

    IPC分类号: G06F13/28

    摘要: A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having one or more local cache memories associated and operatively connected therewith. The method comprises providing a snoop filter device associated with each processing unit, each snoop filter device having a plurality of dedicated input ports for receiving snoop requests from dedicated memory writing sources in the multiprocessor computing environment. Each snoop filter device includes a plurality of parallel operating port snoop filters in correspondence with the plurality of dedicated input ports, each port snoop filter implementing one or more parallel operating sub-filter elements that are adapted to concurrently filter snoop requests received from respective dedicated memory writing sources and forward a subset of those requests to its associated processing unit.

    摘要翻译: 一种用于在具有多个处理单元的多处理器计算环境中支持高速缓存一致性的方法和装置,每个处理单元具有与其相关联并与其可操作地相连的一个或多个本地高速缓冲存储器。 该方法包括提供与每个处理单元相关联的窥探过滤器设备,每个窥探过滤器设备具有多个专用输入端口,用于从多处理器计算环境中的专用存储器写入源接收窥探请求。 每个窥探过滤器装置包括与多个专用输入端口相对应的多个并行操作端口窥探滤波器,每个端口窥探滤波器实现一个或多个并行操作子滤波器元件,其适于同时滤除从相应专用存储器接收的窥探请求 写入源并将这些请求的子集转发到其相关联的处理单元。

    Snoop filtering system in a multiprocessor system
    4.
    发明申请
    Snoop filtering system in a multiprocessor system 有权
    多处理器系统中的Snoop过滤系统

    公开(公告)号:US20060224835A1

    公开(公告)日:2006-10-05

    申请号:US11093127

    申请日:2005-03-29

    IPC分类号: G06F13/28

    摘要: A system and method for supporting cache coherency in a computing environment having multiple processing units, each unit having an associated cache memory system operatively coupled therewith. The system includes a plurality of interconnected snoop filter units, each snoop filter unit corresponding to and in communication with a respective processing unit, with each snoop filter unit comprising a plurality of devices for receiving asynchronous snoop requests from respective memory writing sources in the computing environment; and a point-to-point interconnect comprising communication links for directly connecting memory writing sources to corresponding receiving devices; and, a plurality of parallel operating filter devices coupled in one-to-one correspondence with each receiving device for processing snoop requests received thereat and one of forwarding requests or preventing forwarding of requests to its associated processing unit. Each of the plurality of parallel operating filter devices comprises parallel operating sub-filter elements, each simultaneously receiving an identical snoop request and implementing one or more different snoop filter algorithms for determining those snoop requests for data that are determined not cached locally at the associated processing unit and preventing forwarding of those requests to the processor unit. In this manner, a number of snoop requests forwarded to a processing unit is reduced thereby increasing performance of the computing environment.

    摘要翻译: 一种用于在具有多个处理单元的计算环境中支持高速缓存一致性的系统和方法,每个单元具有与其可操作耦合的相关联的高速缓存存储器系统 该系统包括多个互连的窥探过滤器单元,每个窥探过滤器单元对应于相应处理单元并与其通信,每个窥探过滤器单元包括用于在计算环境中从相应存储器写入源接收异步窥探请求的多个设备 ; 以及包括用于将存储器写入源直接连接到对应的接收设备的通信链路的点对点互连; 以及与每个接收设备一一对应地耦合的多个并行操作过滤器设备,用于处理在其上接收的窥探请求,并且转发请求之一或者阻止将请求转发到其相关联的处理单元。 多个并行操作过滤器装置中的每一个包括并行操作子滤波器元件,每个并行操作子滤波器元件同时接收相同的窥探请求,并且实现一个或多个不同的窥探滤波器算法,用于确定对于在相关处理中本地未被缓存的数据被确定的窥探请求 并且防止将这些请求转发到处理器单元。 以这种方式,减少了转发到处理单元的多个窥探请求,从而增加了计算环境的性能。

    LOW LATENCY MEMORY ACCESS AND SYNCHRONIZATION
    5.
    发明申请
    LOW LATENCY MEMORY ACCESS AND SYNCHRONIZATION 失效
    低延迟存储器访问和同步

    公开(公告)号:US20070204112A1

    公开(公告)日:2007-08-30

    申请号:US11617276

    申请日:2006-12-28

    IPC分类号: G06F12/14

    摘要: A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.

    摘要翻译: 与弱有序的多处理器系统相关联地提供低延迟存储器系统访问。 多处理器中的每个处理器共享资源,并且每个共享资源在锁定设备内具有关联的锁,其提供对多处理器中的多个处理器之间的同步的支持以及资源的有序共享。 当处理器拥有与该资源相关联的锁定时,处理器仅具有访问资源的权限,并且处理器拥有锁的尝试仅需要单个加载操作,而不是传统的原子负载后跟存储,使得处理器 只执行读取操作,并且硬件锁定装置执行后续的写入操作而不是处理器。 还公开了用于非连续数据结构的简单预取。 重新定义存储器线,使得除了正常的物理存储器数据之外,每行包括足够大的指针以指向存储器中的任何其他行,其中指针用于确定要预取的存储器行而不是一些其它预测 算法。 这使得硬件能够有效地预取不连续但重复的存储器访问模式。

    Method and apparatus for filtering snoop requests using a scoreboard
    6.
    发明申请
    Method and apparatus for filtering snoop requests using a scoreboard 失效
    使用记分板过滤窥探请求的方法和装置

    公开(公告)号:US20060224840A1

    公开(公告)日:2006-10-05

    申请号:US11093160

    申请日:2005-03-29

    IPC分类号: G06F13/28

    摘要: An apparatus for implementing snooping cache coherence that locally reduces the number of snoop requests presented to each cache in a multiprocessor system. A snoop filter device associated with a single processor includes one or more “scoreboard” data structures that make snoop determinations, i.e., for each snoop request from another processor, to determine if a request is to be forwarded to the processor or, discarded. At least one scoreboard is active, and at least one scoreboard is determined to be historic at any point in time. A snoop determination of the queue indicates that an entry may be in the cache, but does not indicate its actual residence status. In addition, the snoop filter block implementing scoreboard data structures is operatively coupled with a cache wrap detection logic means whereby, upon detection of a cache wrap condition, the content of the active scoreboard is copied into a historic scoreboard and the content of at least one active scoreboard is reset.

    摘要翻译: 用于实现窥探高速缓存一致性的装置,其本地地减少呈现给多处理器系统中的每个缓存的窥探请求的数量。 与单个处理器相关联的窥探过滤器装置包括一个或多个“记分板”数据结构,其进行窥探确定,即,来自另一个处理器的每个窥探请求,以确定请求是否被转发到处理器或被丢弃。 至少一个记分牌是活跃的,并且至少一个记分牌被确定为在任何时间点的历史。 队列的窥探确定表示一个条目可能在缓存中,但不表示其实际居住状态。 此外,实现记分板数据结构的窥探过滤器块与高速缓存包检测逻辑装置可操作地耦合,由此在检测到缓存包装条件时,将活动记分板的内容复制到历史记分板中,并且至少一个 活动记分板重置。

    Method and apparatus for filtering snoop requests using multiple snoop caches

    公开(公告)号:US20060224839A1

    公开(公告)日:2006-10-05

    申请号:US11093154

    申请日:2005-03-29

    IPC分类号: G06F13/28

    摘要: A method and apparatus for implementing a snoop filter unit associated with a single processor in a multiprocessor system. The snoop filter unit has a plurality of ports, each port receiving snoop requests from exactly one dedicated source. Associated with each port is a snoop cache filter that processes each snoop cache request and records addresses of the most recent snoop requests for exactly one source. The snoop cache filter uses vector encoding to record the occurrence of snoop requests for a sequence of consecutive cache lines. All addresses of snoop requests are added to the snoop cache unless a received snoop cache request matches an entry present in the associated snoop cache, in which case the snoop request is discarded. Otherwise, the associated snoop cache request is enqueued for forwarding to the single processor. Information from all snoop cache filters assigned to all ports in the snoop filter unit are removed in the case that data corresponding to any one of the memory addresses contained in snoop cache filter is loaded in the cache hierarchy of the processor the snoop cache filter is assigned to.

    Method and apparatus for filtering snoop requests using stream registers
    8.
    发明申请
    Method and apparatus for filtering snoop requests using stream registers 有权
    使用流寄存器对窥探请求进行过滤的方法和装置

    公开(公告)号:US20060224836A1

    公开(公告)日:2006-10-05

    申请号:US11093130

    申请日:2005-03-29

    IPC分类号: G06F13/28

    摘要: A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having a local cache memory associated therewith. A snoop filter device is associated with each processing unit and includes at least one snoop filter primitive implementing filtering method based on usage of stream registers sets and associated stream register comparison logic. From the plurality of stream registers sets, at least one stream register set is active, and at least one stream register set is labeled historic at any point in time. In addition, the snoop filter block is operatively coupled with cache wrap detection logic whereby the content of the active stream register set is switched into a historic stream register set upon the cache wrap condition detection, and the content of at least one active stream register set is reset. Each filter primitive implements stream register comparison logic that determines whether a received snoop request is to be forwarded to the processor or discarded.

    摘要翻译: 一种用于在具有多个处理单元的多处理器计算环境中支持高速缓存一致性的方法和装置,每个处理单元具有与其相关联的本地高速缓冲存储器。 窥探过滤设备与每个处理单元相关联并且包括至少一个基于流寄存器集合和相关流寄存器比较逻辑的使用实现过滤方法的窥探过滤器原语。 从多个流寄存器组中,至少一个流寄存器组是有效的,并且至少一个流寄存器集合在任何时间点被标记为历史。 另外,监听滤波器块可操作地与高速缓存包检测逻辑耦合,从而将活动流寄存器集合的内容切换到在高速缓存环绕条件检测时设置的历史流寄存器,并且至少一个活动流寄存器集合的内容 被复位。 每个滤波器基元实现流寄存器比较逻辑,其确定接收的窥探请求是否被转发到处理器或丢弃。

    Methods and apparatus using commutative error detection values for fault isolation in multiple node computers
    9.
    发明申请
    Methods and apparatus using commutative error detection values for fault isolation in multiple node computers 失效
    使用多节点计算机故障隔离交换误差检测值的方法和装置

    公开(公告)号:US20060248370A1

    公开(公告)日:2006-11-02

    申请号:US11106069

    申请日:2005-04-14

    IPC分类号: G06F11/00

    CPC分类号: G06F11/1633

    摘要: The present invention concerns methods and apparatus for performing fault isolation in multiple node computing systems using commutative error detection values—for example, checksums—to identify and to isolate faulty nodes. In the present invention nodes forming the multiple node computing system are networked together and during program execution communicate with one another by transmitting information through the network. When information associated with a reproducible portion of a computer program is injected into the network by a node, a commutative error detection value is calculated and stored in commutative error detection apparatus associated with the node. At intervals, node fault detection apparatus associated with the multiple node computer system retrieve commutative error detection values saved in the commutative error detection apparatus associated with the node and stores them in memory. When the computer program is executed again by the multiple node computer system, new commutative error detection values are created; the node fault detection apparatus retrieves them and stores them in memory. The node fault detection apparatus identifies faulty nodes by comparing commutative error detection values associated with reproducible portions of the application program generated by a particular node from different runs of the application program. Differences in commutative error detection values indicate that the node may be faulty.

    摘要翻译: 本发明涉及在多节点计算系统中使用交换性错误检测值(例如校验和)识别和隔离故障节点来执行故障隔离的方法和装置。 在本发明中,形成多节点计算系统的节点被联网在一起,并且在程序执行期间通过网络传送信息彼此通信。 当与计算机程序的可再现部分相关联的信息被节点注入到网络中时,计算交换性错误检测值并将其存储在与节点相关联的交换错误检测装置中。 间歇地,与多节点计算机系统相关联的节点故障检测装置检索保存在与节点相关联的交换性错误检测装置中的交换性错误检测值,并将其存储在存储器中。 当多节点计算机系统再次执行计算机程序时,创建新的交换错误检测值; 节点故障检测装置检索它们并将其存储在存储器中。 节点故障检测装置通过比较与来自应用程序的不同运行的特定节点生成的应用程序的可再现部分相关联的交换错误检测值来识别故障节点。 交换性错误检测值的差异表明节点可能有故障。

    Managing coherence via put/get windows
    10.
    发明申请
    Managing coherence via put/get windows 失效
    通过put / get窗口管理一致性

    公开(公告)号:US20070055825A1

    公开(公告)日:2007-03-08

    申请号:US10468995

    申请日:2002-02-25

    IPC分类号: G06F13/28

    摘要: A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.

    摘要翻译: 一种用于管理多处理器计算机系统的两个处理器节点的两个处理器之间的相干性的方法和装置。 通常,本发明涉及一种软件算法,其简化并显着加速了传送并行计算机的消息中的高速缓存一致性的管理以及辅助该高速缓存一致性算法的硬件设备。 软件算法使用put / get窗口的打开和关闭来协调激活的所需要的,以实现缓存一致性。 硬件设备可以是硬件地址解码的扩展,其在节点的物理存储器地址空间中创建(a)实际不存在的虚拟存储器的区域,并且(b)因此能够立即响应 从处理元素读取和写入请求。