Apparatus and method for communicating between computer systems using a sliding send window for ordered messages in a clustered computing environment
    11.
    发明授权
    Apparatus and method for communicating between computer systems using a sliding send window for ordered messages in a clustered computing environment 失效
    用于在集群计算环境中使用用于有序消息的滑动发送窗口的计算机系统之间进行通信的装置和方法

    公开(公告)号:US07185099B1

    公开(公告)日:2007-02-27

    申请号:US09718924

    申请日:2000-11-22

    申请人: Timothy Roy Block

    发明人: Timothy Roy Block

    IPC分类号: G06F15/16

    摘要: A clustered computer system includes multiple computer systems (or nodes) coupled together via one or more networks that can become members of a group to work on a particular task. Each node includes a cluster engine, a cluster communication mechanism that includes a sliding send window, and one or more service tasks that process messages. The sliding send window allows a node to send out multiple messages without waiting for an individual acknowledgment to each message. The sliding send window also allows a node that received the multiple messages to send a single acknowledge message for multiple received messages. By using a sliding send window to communicate with other computer systems in the cluster, the communication traffic in the cluster is greatly reduced, thereby enhancing the overall performance of the cluster. In addition, the latency between multiple messages sent concurrently is dramatically reduced.

    摘要翻译: 集群计算机系统包括经由一个或多个网络耦合在一起的多个计算机系统(或节点),其可以成为组的成员以在特定任务上工作。 每个节点包括集群引擎,包括滑动发送窗口的集群通信机制以及处理消息的一个或多个服务任务。 滑动发送窗口允许节点发送多个消息,而不等待每个消息的单独确认。 滑动发送窗口还允许接收到多个消息的节点向多个接收到的消息发送单个确认消息。 通过使用滑动发送窗口与群集中的其他计算机系统通信,群集中的通信流量大大降低,从而提高群集的整体性能。 此外,同时发送的多个消息之间的延迟显着降低。

    Cluster destination address table—IP routing for clusters
    12.
    发明授权
    Cluster destination address table—IP routing for clusters 失效
    集群目的地址表 - 集群的IP路由

    公开(公告)号:US06993034B1

    公开(公告)日:2006-01-31

    申请号:US09173090

    申请日:1998-10-15

    IPC分类号: H04L12/56

    摘要: According to the present invention, a communications protocol supporting cluster configurations more complex than a single LAN is disclosed. A cluster destination address table (CDAT) is used in conjunction with a network message servicer to communicate between computer systems in a cluster. Each computer system preferably contains a cluster servicer, a CDAT, and a network message servicer. The CDAT contains network addresses, status and adapter information for each computer system in a cluster. Although computer systems may have alternate network addresses when they have multiple adapters, the CDAT indexes primary and alternate address information under a single named system. Thus, redundant connections amongst computer systems are identified, while still using the numeric addresses upon which the network message servicer is based. To send a message using the methods of the present invention, the cluster servicer retrieves a network address for a computer system from a CDAT. A message to be sent and the retrieved address are passed to the network message servicer, preferably an Internet Protocol suite. The network message servicer formats the information into a packet and routes the packet.

    摘要翻译: 根据本发明,公开了支持比单个LAN更复杂的集群配置的通信协议。 集群目的地址表(CDAT)与网络消息服务器结合使用,以在集群中的计算机系统之间进行通信。 每个计算机系统优选地包含集群服务器,CDAT和网络消息服务器。 CDAT包含集群中每个计算机系统的网络地址,状态和适配器信息。 虽然计算机系统在具有多个适配器时可能具有备用网络地址,但CDAT将在单个命名系统下索引主地址和备用地址信息。 因此,在仍然使用网络消息服务器所基于的数字地址的同时,识别计算机系统之间的冗余连接。 要使用本发明的方法发送消息,集群服务器从CDAT检索计算机系统的网络地址。 要发送的消息和检索到的地址被传递到网络消息服务器,优选地是因特网协议套件。 网络消息服务器将信息格式化为数据包,并对数据包进行路由。

    Apparatus and Method for Detecting System Reconfiguration and Maintaining Persistent I/O Configuration Data in a Clustered Computer System
    13.
    发明申请
    Apparatus and Method for Detecting System Reconfiguration and Maintaining Persistent I/O Configuration Data in a Clustered Computer System 失效
    用于检测集群计算机系统中系统重新配置和维护持久I / O配置数据的装置和方法

    公开(公告)号:US20090198807A1

    公开(公告)日:2009-08-06

    申请号:US12023128

    申请日:2008-01-31

    IPC分类号: G06F15/173

    CPC分类号: H04L12/40039 H04L12/42

    摘要: In a clustered computer system with multiple power domains, a bus number manager within each power domain manages multiple nodes independently of other power domains. A node within a specified power domain includes a non-volatile memory that includes bus numbering information for its own buses as well as bus numbering information for two of its logically-interconnected neighbors. This creates a distributed database of the interconnection topology for each power domain. Because a node contains bus numbering information about its logical neighbor node(s), the bus numbers for the buses in the nodes are made persistent across numerous different system reconfigurations. The clustered computer system also includes a bus number manager that reads the non-volatile memories in the nodes during initial program load (i.e., boot) that reconstructs the interconnection topology from the information read from the non-volatile memories, and that assigns bus numbers to the buses according to the derived interconnection topology.

    摘要翻译: 在具有多个电源域的集群计算机系统中,每个电源域内的总线号管理器独立于其他电源域管理多个节点。 指定功率域中的节点包括非易失性存储器,其包括用于其自己的总线的总线编号信息以及其逻辑上互连的两个邻居中的两个的总线编号信息。 这为每个电源域创建了互连拓扑的分布式数据库。 因为节点包含有关其逻辑相邻节点的总线编号信息,所以节点中总线的总线号在多个不同的系统重新配置之间被持久化。 集群计算机系统还包括总线编号管理器,其在初始程序加载(即引导)期间读取节点中的非易失性存储器,其根据从非易失性存储器读取的信息重构互连拓扑,并且分配总线编号 根据导出的互连拓扑到总线。

    Dynamic modification of cluster communication parameters in clustered computer system
    14.
    发明授权
    Dynamic modification of cluster communication parameters in clustered computer system 失效
    群集计算机系统中集群通信参数的动态修改

    公开(公告)号:US06983324B1

    公开(公告)日:2006-01-03

    申请号:US09694586

    申请日:2000-10-23

    IPC分类号: G06F15/16

    CPC分类号: H04L67/1002 H04L69/40

    摘要: An apparatus, program product and method support the dynamic modification of cluster communication parameters through a distributed protocol whereby individual nodes locally confirm initiation and status information for every node participating in a parameter modification operation. By doing so, individual nodes are also able to locally determine the need to undo locally-performed parameter modifications should any other node be incapable of performing a parameter modification. Moreover, specifically with respect to cluster communication parameters such as heartbeat parameters, such parameters may be dynamically modified by configuring a sending node to send a heartbeat message to a receiving node, with the heartbeat message indicating that a heartbeat parameter is to be modified. In response to the heartbeat message, the receiving node may then send an acknowledgment message to the sending node that indicates whether the heartbeat parameter has been modified in the receiving node. Further, modification of the heartbeat parameter in the sending node may be deferred until the acknowledgment message from the receiving node indicates that the heartbeat parameter has been modified in the receiving node.

    摘要翻译: 装置,程序产品和方法通过分布式协议支持集群通信参数的动态修改,从而各个节点本地确认参与参数修改操作的每个节点的启动和状态信息。 通过这样做,如果任何其他节点不能执行参数修改,则各个节点还能够本地地确定需要撤销本地执行的参数修改。 此外,具体地,关于诸如心跳参数的集群通信参数,可以通过配置发送节点向接收节点发送心跳消息来动态地修改这些参数,其中心跳消息指示心跳参数将被修改。 响应于心跳消息,接收节点可以然后向发送节点发送指示接收节点中是否已经修改了心跳参数的确认消息。 此外,可以延迟发送节点中的心跳参数的修改,直到来自接收节点的确认消息指示在接收节点中已经修改了心跳参数。

    Cluster node distress signal
    15.
    发明授权
    Cluster node distress signal 有权
    群集节点遇险信号

    公开(公告)号:US06442713B1

    公开(公告)日:2002-08-27

    申请号:US09281026

    申请日:1999-03-30

    IPC分类号: G06F1100

    CPC分类号: H04L69/40 G06F11/2023

    摘要: The preferred embodiment of the present invention provides a cluster node distress system and method that improves the reliability of a cluster. The cluster node distress system provides a cluster node distress signal when a node on the cluster is about to fail. This allows the cluster to better to determine whether a non-communicating node has failed or has merely been partitioned from the cluster. The preferred cluster node distress system is embedded deeply into the operating system and provides a pre-built node distress signal that can be quickly sent to other nodes in the cluster when an imminent failure of that node is detected, improving the probability that the node distress signal will get out before the node totally fails. When the node distress signal is effectively sent to other nodes in the cluster, the cluster can accurately determine that the node has failed and has not just partitioned from the cluster. This allows the cluster to respond correctly, i.e., by assigning other nodes primary responsibility, with less intervention needed by administrators.

    摘要翻译: 本发明的优选实施例提供了一种提高集群可靠性的集群节点遇险系统和方法。 当集群节点即将发生故障时,集群节点遇险系统提供集群节点遇险信号。 这允许集群更好地确定非通信节点是否已经发生故障或仅从群集分区。 首选集群节点遇险系统深入嵌入到操作系统中,并提供一个预先构建的节点遇险信号,当该节点的即将发生故障被检测到时,可以快速发送到集群中的其他节点,从而提高节点遇险的概率 信号将在节点完全失败之前消失。 当节点遇险信号被有效地发送到集群中的其他节点时,集群可以准确地确定节点已经发生故障,并且还没有从集群中分配。 这允许集群正确响应,即通过分配其他节点主要责任,管理员需要较少的干预。

    Detecting system reconfiguration and maintaining persistent I/O configuration data in a clustered computer system
    16.
    发明授权
    Detecting system reconfiguration and maintaining persistent I/O configuration data in a clustered computer system 失效
    检测系统重新配置并维护集群计算机系统中的持久I / O配置数据

    公开(公告)号:US07877471B2

    公开(公告)日:2011-01-25

    申请号:US12023128

    申请日:2008-01-31

    CPC分类号: H04L12/40039 H04L12/42

    摘要: In a clustered computer system with multiple power domains, a bus number manager within each power domain manages multiple nodes independently of other power domains. A node within a specified power domain includes a non-volatile memory that includes bus numbering information for its own buses as well as bus numbering information for two of its logically-interconnected neighbors. This creates a distributed database of the interconnection topology for each power domain. Because a node contains bus numbering information about its logical neighbor node(s), the bus numbers for the buses in the nodes are made persistent across numerous different system reconfigurations. The clustered computer system also includes a bus number manager that reads the non-volatile memories in the nodes during initial program load (i.e., boot) that reconstructs the interconnection topology from the information read from the non-volatile memories, and that assigns bus numbers to the buses according to the derived interconnection topology.

    摘要翻译: 在具有多个电源域的集群计算机系统中,每个电源域内的总线号管理器独立于其他电源域管理多个节点。 指定功率域中的节点包括非易失性存储器,其包括用于其自己的总线的总线编号信息以及其逻辑上互连的两个邻居中的两个的总线编号信息。 这为每个电源域创建了互连拓扑的分布式数据库。 因为节点包含有关其逻辑相邻节点的总线编号信息,所以节点中总线的总线号在多个不同的系统重新配置之间被持久化。 集群计算机系统还包括总线编号管理器,其在初始程序加载(即引导)期间读取节点中的非易失性存储器,其根据从非易失性存储器读取的信息重构互连拓扑,并且分配总线编号 根据导出的互连拓扑到总线。

    Node self-start in a decentralized cluster
    17.
    发明授权
    Node self-start in a decentralized cluster 有权
    在分散式集群中节点自启动

    公开(公告)号:US07240088B2

    公开(公告)日:2007-07-03

    申请号:US10057188

    申请日:2002-01-25

    IPC分类号: G06F15/16

    摘要: Methods, systems and articles of manufacture for automatically starting a node in a clustered computer system. A starting state value may be assigned to the node and a discovery process initiated to find a sponsor node. If a sponsor node is found, the node is joined with the sponsor node in the clustered computer system. If a sponsor node is not found, the node is started as a one-node cluster in the clustered computer system. An active state value is assigned to the node upon inclusion into the clustered computer system.

    摘要翻译: 用于在集群计算机系统中自动启动节点的方法,系统和制品。 可以向节点分配起始状态值,并且启动发现进程以找到赞助者节点。 如果找到赞助商节点,则该节点与群集计算机系统中的赞助商节点相连。 如果未找到赞助商节点,则该节点作为群集计算机系统中的单节点群集启动。 在包含在群集计算机系统中时,活动状态值被分配给该节点。

    Dynamic modification of fragmentation size cluster communication parameter in clustered computer system
    18.
    发明授权
    Dynamic modification of fragmentation size cluster communication parameter in clustered computer system 失效
    群集计算机系统中分段大小集群通信参数的动态修改

    公开(公告)号:US06934768B1

    公开(公告)日:2005-08-23

    申请号:US09694599

    申请日:2000-10-23

    IPC分类号: G06F11/00 G06F15/16

    摘要: An apparatus, program product and method support the dynamic modification of cluster communication parameters such as a fragmentation size parameter through controllably deferring the processing of a requested fragmentation size change in a source node until after receipt an acknowledgment message for at least one unacknowledged message sent by the source node to a plurality of target nodes. By controllably deferring such processing until it is confirmed that any such previously-unacknowledged messages sent by a source node have been received by any target nodes, synchronization between the source node and the target nodes may be obtained, and a fragmentation size change may occur in a coordinated fashion such that future messages from the source node to the target node will be processed by both the source and the target nodes using the modified fragmentation size parameter.

    摘要翻译: 装置,程序产品和方法通过可控地推迟在源节点中的所请求的分片大小改变的处理直到在收到至少一个未确认的消息的确认消息之后支持群集通信参数(例如分段大小参数)的动态修改, 源节点到多个目标节点。 通过可控地推迟这种处理,直到确认由源节点发送的任何此前未确认的消息已经被任何目标节点接收到,可以获得源节点和目标节点之间的同步,并且可能发生分段大小改变 协调的方式,使得来自源节点到目标节点的未来消息将由源和目标节点使用修改的分段大小参数来处理。

    Node shutdown in clustered computer system
    19.
    发明授权
    Node shutdown in clustered computer system 失效
    集群计算机系统中的节点关闭

    公开(公告)号:US06918051B2

    公开(公告)日:2005-07-12

    申请号:US09827804

    申请日:2001-04-06

    IPC分类号: G06F11/00 G06F15/16 H04L12/28

    CPC分类号: G06F11/0796 G06F11/0715

    摘要: A clustered computer system, apparatus, program product and method utilize a group member-initiated shutdown process to terminate clustering on a node in an automated and orderly fashion, typically in the event of a failure detected by a group member residing on that node. As a component of such a process, node leave operations are initiated on the other nodes in a clustered computer system, thereby permitting any dependency failovers to occur in an automated fashion. Moreover, other group members on a node to be shutdown are preemptively terminated prior to local detection of the failure within those other group members, so that termination of clustering on the node may be initiated to complete a shutdown operation.

    摘要翻译: 集群计算机系统,装置,程序产品和方法利用组成员启动的关闭过程以自动和有序的方式终止节点上的聚类,通常在驻留在该节点上的组成员检测到故障的情况下。 作为这种过程的组成部分,在集群计算机系统中的其他节点上启动节点离开操作,从而允许以自动方式发生任何依赖关系故障转移。 此外,在本地检测到那些其他组成员内的故障之前,要关闭的节点上的其他组成员被抢先终止,从而可以启动节点上的聚类终止以完成关机操作。

    Processing sequenced records across multiple network connections
    20.
    发明授权
    Processing sequenced records across multiple network connections 有权
    处理跨多个网络连接的顺序记录

    公开(公告)号:US07886059B2

    公开(公告)日:2011-02-08

    申请号:US12049273

    申请日:2008-03-15

    IPC分类号: G06F15/16

    摘要: An apparatus and method allows processing sequenced records across multiple network connections. A “logical connection” is defined to include one or more network connections. Each message is assigned a sequence number that allows the messages to be ordered on the other end according to sequence number, regardless of which network connection in the logical connection is used to transfer the message. By defining messages, sequencing those messages, and transferring the messages over multiple network connections, the throughput and performance of networked computer systems are substantially increased.

    摘要翻译: 一种装置和方法允许跨多个网络连接处理有序记录。 “逻辑连接”被定义为包括一个或多个网络连接。 每个消息被分配一个序列号,该序列号允许根据序列号在另一端订购消息,而不管逻辑连接中哪个网络连接用于传送消息。 通过定义消息,对这些消息进行排序,并通过多个网络连接传送消息,网络计算机系统的吞吐量和性能显着增加。