Scalable method of continuous monitoring the remotely accessible resources against the node failures for very large clusters
    1.
    发明授权
    Scalable method of continuous monitoring the remotely accessible resources against the node failures for very large clusters 有权
    可扩展的方法,连续监视远程可访问的资源,防止非常大的群集的节点故障

    公开(公告)号:US07296191B2

    公开(公告)日:2007-11-13

    申请号:US11456585

    申请日:2006-07-11

    IPC分类号: G06F11/00

    摘要: The notion of controlling, using and monitoring remote resources in a distributed data processing system through the use of proxy resource managers and agents is extended to provide failover capability so that resource coverage is preserved and maintained even in the event of either temporary or longer duration node failure. Mechanisms are provided for consistent determination of resource status. Mechanisms are also provided which facilitate the joining of nodes to a group of nodes while still preserving remote resource operations. Additional mechanisms are also provided for the return of remote resource management to the control of a previously failed, but now recovered node, even if the failure had resulted in a node reset.

    摘要翻译: 扩展了通过使用代理资源管理器和代理来控制,使用和监视分布式数据处理系统中的远程资源的概念,以提供故障转移功能,以便即使在临时或更长持续时间节点的情况下也可以保留和维护资源覆盖 失败。 为资源状况的一致确定提供了机制。 还提供了机制,其有助于将节点连接到一组节点,同时仍保留远程资源操作。 还提供了附加机制,用于将远程资源管理返回到先前故障但现在恢复的节点的控制,即使故障导致节点重置。

    SCALABLE METHOD OF CONTINUOUS MONITORING THE REMOTELY ACCESSIBLE RESOURCES AGAINST NODE FAILURES FOR VERY LARGE CLUSTERS
    2.
    发明申请
    SCALABLE METHOD OF CONTINUOUS MONITORING THE REMOTELY ACCESSIBLE RESOURCES AGAINST NODE FAILURES FOR VERY LARGE CLUSTERS 有权
    连续监测远程访问资源的可扩展方法对于非常大的群集的节点故障

    公开(公告)号:US20080313333A1

    公开(公告)日:2008-12-18

    申请号:US12146008

    申请日:2008-06-25

    IPC分类号: G06F15/173

    摘要: The notion of controlling, using and monitoring remote resources in a distributed data processing system through the use of proxy resource managers and agents is extended to provide failover capability so that resource coverage is preserved and maintained even in the event of either temporary or longer duration node failure. Mechanisms are provided for consistent determination of resource status. Mechanisms are also provided which facilitate the joining of nodes to a group of nodes while still preserving remote resource operations. Additional mechanisms are also provided for the return of remote resource management to the control of a previously failed, but now recovered node, even if the failure had resulted in a node reset.

    摘要翻译: 扩展了通过使用代理资源管理器和代理来控制,使用和监视分布式数据处理系统中的远程资源的概念,以提供故障切换功能,以便即使在临时或更长持续时间节点的情况下也可以保留和维护资源覆盖 失败。 为资源状况的一致确定提供了机制。 还提供了机制,其有助于将节点连接到一组节点,同时仍保留远程资源操作。 还提供了附加机制,用于将远程资源管理返回到先前故障但现在恢复的节点的控制,即使故障导致节点重置。

    Scalable method of continuous monitoring the remotely accessible resources against node failures for very large clusters
    3.
    发明授权
    Scalable method of continuous monitoring the remotely accessible resources against node failures for very large clusters 有权
    可扩展的方法,连续监视远程可访问的资源,防止非常大的群集的节点故障

    公开(公告)号:US07814373B2

    公开(公告)日:2010-10-12

    申请号:US12146008

    申请日:2008-06-25

    IPC分类号: G06F11/00

    摘要: The notion of controlling, using and monitoring remote resources in a distributed data processing system through the use of proxy resource managers and agents is extended to provide failover capability so that resource coverage is preserved and maintained even in the event of either temporary or longer duration node failure. Mechanisms are provided for consistent determination of resource status. Mechanisms are also provided which facilitate the joining of nodes to a group of nodes while still preserving remote resource operations. Additional mechanisms are also provided for the return of remote resource management to the control of a previously failed, but now recovered node, even if the failure had resulted in a node reset.

    摘要翻译: 扩展了通过使用代理资源管理器和代理来控制,使用和监视分布式数据处理系统中的远程资源的概念,以提供故障切换功能,以便即使在临时或更长持续时间节点的情况下也可以保留和维护资源覆盖 失败。 为资源状况的一致确定提供了机制。 还提供了机制,其有助于将节点连接到一组节点,同时仍保留远程资源操作。 还提供了附加机制,用于将远程资源管理返回到先前故障但现在恢复的节点的控制,即使故障导致节点重置。

    Scalable method of continuous monitoring the remotely accessible resources against the node failures for very large clusters
    4.
    发明授权
    Scalable method of continuous monitoring the remotely accessible resources against the node failures for very large clusters 有权
    可扩展的方法,连续监视远程可访问的资源,防止非常大的群集的节点故障

    公开(公告)号:US07137040B2

    公开(公告)日:2006-11-14

    申请号:US10365193

    申请日:2003-02-12

    IPC分类号: G06F11/00

    摘要: The notion of controlling, using and monitoring remote resources in a distributed data processing system through the use of proxy resource managers and agents is extended to provide failover capability so that resource coverage is preserved and maintained even in the event of either temporary or longer duration node failure. Mechanisms are provided for consistent determination of resource status. Mechanisms are also provided which facilitate the joining of nodes to a group of nodes while still preserving remote resource operations. Additional mechanisms are also provided for the return of remote resource management to the control of a previously failed, but now recovered node, even if the failure had resulted in a node reset.

    摘要翻译: 扩展了通过使用代理资源管理器和代理来控制,使用和监视分布式数据处理系统中的远程资源的概念,以提供故障切换功能,以便即使在临时或更长持续时间节点的情况下也可以保留和维护资源覆盖 失败。 为资源状况的一致确定提供了机制。 还提供了机制,其有助于将节点连接到一组节点,同时仍保留远程资源操作。 还提供了附加机制,用于将远程资源管理返回到先前故障但现在恢复的节点的控制,即使故障导致节点重置。

    Method for using a priority queue to perform job scheduling on a cluster based on node rank and performance
    5.
    发明授权
    Method for using a priority queue to perform job scheduling on a cluster based on node rank and performance 有权
    基于节点等级和性能使用优先级队列对集群执行作业调度的方法

    公开(公告)号:US07827435B2

    公开(公告)日:2010-11-02

    申请号:US11057969

    申请日:2005-02-15

    IPC分类号: G06F11/00

    CPC分类号: G06F9/505 G06F2209/508

    摘要: In a multi node information processing system, a method for scheduling jobs, includes steps of: determining node-related performance parameters for a plurality of nodes; determining a ranking for each node based on the node related performance parameters for each node; and ordering each nodes by its ranking for job scheduling.

    摘要翻译: 在多节点信息处理系统中,调度作业的方法包括以下步骤:确定多个节点的节点相关性能参数; 基于每个节点的与节点相关的性能参数来确定每个节点的排名; 并通过其对作业调度的排名来排序每个节点。

    METHOD FOR ORGANIZING PROCESSES
    6.
    发明申请
    METHOD FOR ORGANIZING PROCESSES 审中-公开
    组织方法的方法

    公开(公告)号:US20100017244A1

    公开(公告)日:2010-01-21

    申请号:US12174130

    申请日:2008-07-16

    IPC分类号: G06Q10/00

    CPC分类号: G06Q10/06 G06Q10/06393

    摘要: Techniques for generating a target process are provided. The techniques include identifying at least one of one or more steps and one or more artifacts within a target process and one or more other processes, pre-fetching the at least one of one or more atomic steps, one or more decision steps and splits and one or more merges to be used in the target process from the one or more other processes, and associating the at least one of one or more atomic steps, one or more decision steps and splits and one or more merges to be used in the target process at one or more decision points to generate the target process.

    摘要翻译: 提供了用于生成目标过程的技术。 这些技术包括识别目标过程和一个或多个其他过程中的一个或多个步骤和一个或多个工件中的至少一个,预取一个或多个原子步骤中的至少一个,一个或多个决定步骤和分割,以及 在一个或多个其他过程中在目标过程中使用的一个或多个合并,以及将一个或多个原子步骤中的至少一个,一个或多个决策步骤和分割以及要在目标中使用的一个或多个合并 在一个或多个决策点处处理以产生目标过程。

    Hybrid event prediction and system control
    8.
    发明授权
    Hybrid event prediction and system control 有权
    混合事件预测和系统控制

    公开(公告)号:US07895323B2

    公开(公告)日:2011-02-22

    申请号:US12267762

    申请日:2008-11-10

    IPC分类号: G06F15/173

    摘要: A system for predicting an occurrence of a critical even in a computer cluster includes: a control system that includes an event log, a system parameter log, a memory for storing information related to occurrences of critical events, and a processor. The processor implements a hybrid prediction system; loads the information from the event log and the system performance log into a Bayesian network model; uses the Bayesian network model to predict a future critical event; makes future scheduling and current data migration selections; and adapts the Bayesian network model by feeding the scheduling and data migration selections.

    摘要翻译: 一种用于预测计算机集群中关键事件的发生的系统包括:包括事件日志,系统参数日志,用于存储与关键事件发生有关的信息的存储器的处理器的控制系统。 处理器实现混合预测系统; 将事件日志和系统性能日志中的信息加载到贝叶斯网络模型中; 使用贝叶斯网络模型预测未来的关键事件; 使未来调度和当前数据迁移选择; 并通过馈送调度和数据迁移选择来适应贝叶斯网络模型。

    Hybrid method for event prediction and system control
    10.
    发明授权
    Hybrid method for event prediction and system control 失效
    用于事件预测和系统控制的混合方法

    公开(公告)号:US07451210B2

    公开(公告)日:2008-11-11

    申请号:US10720300

    申请日:2003-11-24

    IPC分类号: G06F15/173

    摘要: A hybrid method of predicting the occurrence of future critical events in a computer cluster having a series of nodes records system performance parameters and the occurrence of past critical events. A data filter filters the logged to data to eliminate redundancies and decrease the data storage requirements of the system. Time-series models and rule based classification schemes are used to associate various system parameters with the past occurrence of critical events and predict the occurrence of future critical events. Ongoing processing jobs are migrated to nodes for which no critical events are predicted and future jobs are routed to more robust nodes.

    摘要翻译: 在具有一系列节点的计算机集群中预测未来关键事件的发生的混合方法记录系统性能参数和过去关键事件的发生。 数据过滤器将记录到数据进行过滤,以消除冗余并减少系统的数据存储要求。 时间序列模型和基于规则的分类方案用于将各种系统参数与过去发生的关键事件相关联,并预测未来关键事件的发生。 正在进行的处理作业将迁移到不预测到关键事件的节点,并且将来的作业路由到更健壮的节点。