Method for using a priority queue to perform job scheduling on a cluster based on node rank and performance
    1.
    发明授权
    Method for using a priority queue to perform job scheduling on a cluster based on node rank and performance 有权
    基于节点等级和性能使用优先级队列对集群执行作业调度的方法

    公开(公告)号:US07827435B2

    公开(公告)日:2010-11-02

    申请号:US11057969

    申请日:2005-02-15

    IPC分类号: G06F11/00

    CPC分类号: G06F9/505 G06F2209/508

    摘要: In a multi node information processing system, a method for scheduling jobs, includes steps of: determining node-related performance parameters for a plurality of nodes; determining a ranking for each node based on the node related performance parameters for each node; and ordering each nodes by its ranking for job scheduling.

    摘要翻译: 在多节点信息处理系统中,调度作业的方法包括以下步骤:确定多个节点的节点相关性能参数; 基于每个节点的与节点相关的性能参数来确定每个节点的排名; 并通过其对作业调度的排名来排序每个节点。

    HYBRID EVENT PREDICTION AND SYSTEM CONTROL
    2.
    发明申请
    HYBRID EVENT PREDICTION AND SYSTEM CONTROL 有权
    混合事件预测和系统控制

    公开(公告)号:US20090070628A1

    公开(公告)日:2009-03-12

    申请号:US12267762

    申请日:2008-11-10

    IPC分类号: G06F11/07

    摘要: A system for predicting an occurrence of a critical even in a computer cluster includes: a control system that includes an event log, a system parameter log, a memory for storing information related to occurrences of critical events, and a processor. The processor implements a hybrid prediction system; loads the information from the event log and the system performance log into a Bayesian network model; uses the Bayesian network model to predict a future critical event; makes future scheduling and current data migration selections; and adapts the Bayesian network model by feeding the scheduling and data migration selections.

    摘要翻译: 一种用于预测计算机集群中关键事件的发生的系统包括:包括事件日志,系统参数日志,用于存储与关键事件发生有关的信息的存储器的处理器的控制系统。 处理器实现混合预测系统; 将事件日志和系统性能日志中的信息加载到贝叶斯网络模型中; 使用贝叶斯网络模型预测未来的关键事件; 使未来调度和当前数据迁移选择; 并通过馈送调度和数据迁移选择来适应贝叶斯网络模型。

    Hybrid event prediction and system control
    3.
    发明授权
    Hybrid event prediction and system control 有权
    混合事件预测和系统控制

    公开(公告)号:US07895323B2

    公开(公告)日:2011-02-22

    申请号:US12267762

    申请日:2008-11-10

    IPC分类号: G06F15/173

    摘要: A system for predicting an occurrence of a critical even in a computer cluster includes: a control system that includes an event log, a system parameter log, a memory for storing information related to occurrences of critical events, and a processor. The processor implements a hybrid prediction system; loads the information from the event log and the system performance log into a Bayesian network model; uses the Bayesian network model to predict a future critical event; makes future scheduling and current data migration selections; and adapts the Bayesian network model by feeding the scheduling and data migration selections.

    摘要翻译: 一种用于预测计算机集群中关键事件的发生的系统包括:包括事件日志,系统参数日志,用于存储与关键事件发生有关的信息的存储器的处理器的控制系统。 处理器实现混合预测系统; 将事件日志和系统性能日志中的信息加载到贝叶斯网络模型中; 使用贝叶斯网络模型预测未来的关键事件; 使未来调度和当前数据迁移选择; 并通过馈送调度和数据迁移选择来适应贝叶斯网络模型。

    Hybrid method for event prediction and system control
    4.
    发明授权
    Hybrid method for event prediction and system control 失效
    用于事件预测和系统控制的混合方法

    公开(公告)号:US07451210B2

    公开(公告)日:2008-11-11

    申请号:US10720300

    申请日:2003-11-24

    IPC分类号: G06F15/173

    摘要: A hybrid method of predicting the occurrence of future critical events in a computer cluster having a series of nodes records system performance parameters and the occurrence of past critical events. A data filter filters the logged to data to eliminate redundancies and decrease the data storage requirements of the system. Time-series models and rule based classification schemes are used to associate various system parameters with the past occurrence of critical events and predict the occurrence of future critical events. Ongoing processing jobs are migrated to nodes for which no critical events are predicted and future jobs are routed to more robust nodes.

    摘要翻译: 在具有一系列节点的计算机集群中预测未来关键事件的发生的混合方法记录系统性能参数和过去关键事件的发生。 数据过滤器将记录到数据进行过滤,以消除冗余并减少系统的数据存储要求。 时间序列模型和基于规则的分类方案用于将各种系统参数与过去发生的关键事件相关联,并预测未来关键事件的发生。 正在进行的处理作业将迁移到不预测到关键事件的节点,并且将来的作业路由到更健壮的节点。

    Method and system for deciding when to checkpoint an application based on risk analysis
    5.
    发明授权
    Method and system for deciding when to checkpoint an application based on risk analysis 失效
    基于风险分析决定何时检查应用程序的方法和系统

    公开(公告)号:US07392433B2

    公开(公告)日:2008-06-24

    申请号:US11042611

    申请日:2005-01-25

    IPC分类号: G06F11/00

    CPC分类号: G06F11/1471

    摘要: Briefly, according to the invention in an information processing system including a plurality of information processing nodes, a request for checkpointing by an application includes node health criteria (or parameters). The system has the authority to grant or deny the checkpointing request depending on the system health or availability. This scheme significantly improves not only the system performance, but also the application running time as the system. By skipping a checkpoint the application can use the same time to run the application instead of spending extra time for checkpointing.

    摘要翻译: 简而言之,根据本发明,在包括多个信息处理节点的信息处理系统中,由应用程序检查点的请求包括节点健康标准(或参数)。 系统有权根据系统运行状况或可用性来授予或拒绝检查点请求。 该方案不仅显着提高了系统性能,而且显着提高了作为系统的应用运行时间。 通过跳过检查点,应用程序可以使用相同的时间运行应用程序,而不是花费额外的时间进行检查点。

    Method and System for Online Detection of Multi-Component Interactions in Computing Systems
    6.
    发明申请
    Method and System for Online Detection of Multi-Component Interactions in Computing Systems 审中-公开
    计算系统中多组件交互的在线检测方法与系统

    公开(公告)号:US20120283991A1

    公开(公告)日:2012-11-08

    申请号:US13102921

    申请日:2011-05-06

    IPC分类号: G06F11/30

    CPC分类号: G06F11/0751

    摘要: A method of the present invention provides an efficient, two-stage, online method for discovering interactions among components and groups of components, including time-delayed effects, in large production systems. The first stage compresses a set of anomaly signals using a principal component analysis and passes the resulting eigensignals and a small set of other signals to the second stage, a lag correlation detector, which identifies time-delayed correlations. Real use cases are described from eight unmodified production systems.

    摘要翻译: 本发明的方法提供了一种有效的两阶段在线方法,用于在大型生产系统中发现组件和组件之间的相互作用,包括时间延迟效应。 第一阶段使用主成分分析来压缩一组异常信号,并将所得到的特征信号和一小组其他信号传递到第二级,滞后相关检测器识别时间延迟的相关性。 实际用例由八个未经修改的生产系统进行描述。