Root filesystem failover in a single system image environment
    1.
    发明授权
    Root filesystem failover in a single system image environment 失效
    在单个系统映像环境中根文件系统故障转移

    公开(公告)号:US06249879B1

    公开(公告)日:2001-06-19

    申请号:US09071048

    申请日:1998-04-30

    IPC分类号: G06F1100

    CPC分类号: G06F11/1435

    摘要: A method and apparatus for transparent failover of a filesystem within a computer cluster is provided. For failover protection, a filesystem is physically connected to an active server node and a standby server node. A cluster file system provides distributed access to the filesystem throughout the computer cluster. The cluster file system monitors the progress of each operation performed on the failover protected filesystem. If the active server node should fail during an operation, all processes performing operations on the failover protected filesystem are caused to sleep. The filesystem is then relocated to the standby server node. The cluster file system then awakens each sleeping process and retries each pending operation.

    摘要翻译: 提供了用于计算机集群内的文件系统的透明故障转移的方法和装置。 对于故障转移保护,文件系统物理连接到活动服务器节点和备用服务器节点。 集群文件系统为整个计算机集群中的文件系统提供分布式访问。 集群文件系统监视在故障转移保护文件系统上执行的每个操作的进度。 如果活动服务器节点在运行期间失败,那么对故障转移保护文件系统执行操作的所有进程都将进入休眠状态。 然后将文件系统重新定位到备用服务器节点。 然后,群集文件系统唤醒每个休眠进程并重试每个待处理的操作。

    Filesystem failover in a single system image environment
    2.
    发明授权
    Filesystem failover in a single system image environment 失效
    文件系统故障切换在单个系统映像环境中

    公开(公告)号:US06247139B1

    公开(公告)日:2001-06-12

    申请号:US09071145

    申请日:1998-04-30

    IPC分类号: G06F1516

    CPC分类号: G06F17/30067

    摘要: A method and apparatus for transparent failover of a filesystem within a computer cluster is provided. For failover protection, a filesystem is physically connected to an active server node and a standby server node. A cluster file system provides distributed access to the filesystem throughout the computer cluster. The cluster file system monitors the progress of each operation performed on the failover protected filesystem. If the active server node should fail during an operation, all processes performing operations on the failover protected filesystem are caused to sleep. The filesystem is then relocated to the standby server node. The cluster file system then awakens each sleeping process and retries each pending operation.

    摘要翻译: 提供了用于计算机集群内的文件系统的透明故障转移的方法和装置。 对于故障转移保护,文件系统物理连接到活动服务器节点和备用服务器节点。 集群文件系统为整个计算机集群中的文件系统提供分布式访问。 集群文件系统监视在故障转移保护文件系统上执行的每个操作的进度。 如果活动服务器节点在运行期间失败,那么对故障转移保护文件系统执行操作的所有进程都将进入休眠状态。 然后将文件系统重新定位到备用服务器节点。 然后,群集文件系统唤醒每个休眠进程并重试每个待处理的操作。

    Failure recovery for process relationships in a single system image
environment
    3.
    发明授权
    Failure recovery for process relationships in a single system image environment 失效
    单个系统映像环境中进程关系的故障恢复

    公开(公告)号:US6115830A

    公开(公告)日:2000-09-05

    申请号:US050226

    申请日:1998-03-28

    IPC分类号: G06F11/00 H02H3/05 G06F17/30

    摘要: A system for recovery of process relationships following node failure within a computer cluster is provided. For relationship recovery, each node maintains set of care relationships. Each relationship is of the form carer cares about care target. Care relationships describe process relations such as parent-child or group leader-group member. Care relationships are stored at the origin node of their care targets. Following node failure, a surrogate origin node is selected. The surviving nodes then cooperate to rebuild vproc structures and care relationships for the processes that originated at the failed node at the surrogate origin node. The surviving nodes then determine which of their own care targets were terminated by the node failure. For each terminated care targets, notifications are sent to the appropriate carers. This allows surviving processes to correctly recover from severed process relationships.

    摘要翻译: 提供了一种用于恢复计算机集群内的节点故障之后的过程关系的系统。 对于关系恢复,每个节点维护一组关心关系。 每个关系都是护理人员关心护理目标的形式。 护理关系描述过程关系,如父子或组领导组成员。 护理关系存储在其护理目标的原始节点。 以下节点发生故障,选择代理起始节点。 然后幸存的节点合作重建vproc结构,并保留在代理源节点处的故障节点处发起的进程的关系。 幸存的节点然后确定他们自己的关心目标中的哪一个被节点故障终止。 对于每个终止的护理目标,通知将发送给适当的护理人员。 这允许存活过程从断开的过程关系中正确恢复。

    Router for parallel computer including arrangement for redirecting
messages
    4.
    发明授权
    Router for parallel computer including arrangement for redirecting messages 失效
    用于并行计算机的路由器,包括用于重定向消息的布置

    公开(公告)号:US5530809A

    公开(公告)日:1996-06-25

    申请号:US181711

    申请日:1994-01-14

    摘要: A digital computer comprising a plurality of message generating nodes interconnected by a routing network. The routing network transfers messages among the message generating elements in accordance with address information identifying a destination message generating element. Each message generating node includes a message data generator and a network interface. The message data generator generates message data items each including an address data portion comprising a destination identifier. The network interface includes a message generator and an address translation table, the table including a plurality of entries identifying, for at least one destination identifier, a translated destination identifier. The message generator, in response to the receipt of a message data item from the message data generator, generates a message for transmission to the routing network. In generating the message, the message generator performs an address translation operation in connection with the address data and the contents of the address translation table to generate updated address data which it uses data in connection with generating address information for the message.

    摘要翻译: 一种数字计算机,包括由路由网络互连的多个消息生成节点。 路由网络根据识别目的地消息生成元素的地址信息在消息生成元件之间传送消息。 每个消息生成节点包括消息数据生成器和网络接口。 消息数据生成器生成包括包括目的地标识符的地址数据部分的消息数据项。 网络接口包括消息发生器和地址转换表,该表包括多个条目,用于为至少一个目的地标识符标识已翻译的目的地标识符。 消息生成器响应于从消息数据生成器接收到消息数据项,生成用于传输到路由网络的消息。 在生成消息时,消息生成器执行与地址数据和地址转换表的内容相关的地址转换操作,以生成更新的地址数据,其使用与为消息生成地址信息相结合的数据。

    Filesystem data integrity in a single system image environment
    5.
    发明授权
    Filesystem data integrity in a single system image environment 失效
    文件系统在单一系统映像环境中的完整性

    公开(公告)号:US6122629A

    公开(公告)日:2000-09-19

    申请号:US70897

    申请日:1998-04-30

    IPC分类号: G06F11/20 G06F17/30

    摘要: A system for protection of filesystem data integrity within a computer cluster is provided. The system uses redundant data caches at client and server nodes within the computer cluster. Caching of filesystem data is controlled so that non-shared files are preferably cached at client nodes. This increases filesystem performance within the computer cluster and ensures that failures may not result in a loss of modified filesystem data without a corresponding loss to the process(es) accessing that data. Shared files are cached at the server node and a backup cache node. This protects modified filesystem data against any single node failure.

    摘要翻译: 提供了一种用于保护计算机集群内的文件系统数据完整性的系统。 系统在计算机集群内的客户端和服务器节点使用冗余数据高速缓存。 控制文件系统数据的缓存,使得非共享文件优选地在客户机节点处被缓存。 这会增加计算机集群内的文件系统性能,并确保失败可能不会导致修改后的文件系统数据丢失,而不会对访问该数据的进程造成相应的损失。 共享文件缓存在服务器节点和备份缓存节点。 这可以保护修改的文件系统数据免受任何单个节点故障的影响。

    Dynamically modeling and selecting a checkpoint scheme based upon an application workload
    6.
    发明授权
    Dynamically modeling and selecting a checkpoint scheme based upon an application workload 有权
    基于应用程序工作负载动态建模和选择检查点方案

    公开(公告)号:US08627143B2

    公开(公告)日:2014-01-07

    申请号:US12834603

    申请日:2010-07-12

    IPC分类号: G06F11/00

    摘要: Illustrated is a system and method for executing a checkpoint scheme as part of processing a workload using an application. The system and method also includes identifying a checkpoint event that requires an additional checkpoint scheme. The system and method includes retrieving checkpoint data associated with the checkpoint event. It also includes building a checkpoint model based upon the checkpoint data. The system and method further includes identifying the additional checkpoint scheme, based upon the checkpoint model, the additional checkpoint scheme to be executed as part of the processing of the workload using the application.

    摘要翻译: 说明的是用于执行检查点方案作为使用应用程序处理工作负载的一部分的系统和方法。 系统和方法还包括识别需要附加检查点方案的检查点事件。 该系统和方法包括检索与检查点事件相关联的检查点数据。 它还包括基于检查点数据构建检查点模型。 系统和方法还包括基于检查点模型来识别附加检查点方案,作为使用应用程序处理工作负载的一部分来执行的附加检查点方案。

    DYNAMICALLY MODELING AND SELECTING A CHECKPOINT SCHEME BASED UPON AN APPLICATION WORKLOAD
    7.
    发明申请
    DYNAMICALLY MODELING AND SELECTING A CHECKPOINT SCHEME BASED UPON AN APPLICATION WORKLOAD 有权
    基于应用程序工作动态动态建模和选择检查点方案

    公开(公告)号:US20120011401A1

    公开(公告)日:2012-01-12

    申请号:US12834603

    申请日:2010-07-12

    IPC分类号: G06F11/07

    摘要: Illustrated is a system and method for executing a checkpoint scheme as part of processing a workload using an application. The system and method also includes identifying a checkpoint event that requires an additional checkpoint scheme. The system and method includes retrieving checkpoint data associated with the checkpoint event. It also includes building a checkpoint model based upon the checkpoint data. The system and method further includes identifying the additional checkpoint scheme, based upon the checkpoint model, the additional checkpoint scheme to be executed as part of the processing of the workload using the application.

    摘要翻译: 说明的是用于执行检查点方案作为使用应用程序处理工作负载的一部分的系统和方法。 系统和方法还包括识别需要附加检查点方案的检查点事件。 系统和方法包括检索与检查点事件相关联的检查点数据。 它还包括基于检查点数据构建检查点模型。 系统和方法还包括基于检查点模型来识别附加检查点方案,作为使用应用程序处理工作负载的一部分来执行的附加检查点方案。