Hardware/software based indirect time stamping methodology for proactive hardware/software event detection and control
    1.
    发明授权
    Hardware/software based indirect time stamping methodology for proactive hardware/software event detection and control 失效
    用于主动硬件/软件事件检测和控制的基于硬件/软件的间接时间戳方法

    公开(公告)号:US07529979B2

    公开(公告)日:2009-05-05

    申请号:US10735412

    申请日:2003-12-12

    IPC分类号: G06F11/00

    摘要: An improved method and apparatus for time stamping events occurring on a large scale distributed network uses a local counter associated with each processor of the distributed network. Each counter resets at the same time globally so that all events are recorded with respect to a particular time. The counter is stopped when a critical event is detected. The events are masked or filtered in an online or offline fashion to eliminate non-critical events from triggering a collection by the system monitor or service/host processor. The masking can be done dynamically through the use of an event history logger. The central system may poll the remote processor periodically to receive the accurate counter value from the local counter and device control register. Remedial action can be taken when conditional probability calculations performed on the historical information indicate that a critical event is about to occur.

    摘要翻译: 用于在大规模分布式网络上发生的时间戳事件的改进的方法和装置使用与分布式网络的每个处理器相关联的本地计数器。 每个计数器在全局同时重置,以便在特定时间内记录所有事件。 当检测到关键事件时,计数器停止。 这些事件以在线或离线方式被屏蔽或过滤,以消除系统监视器或服务/主机处理器触发集合的非关键事件。 可以通过使用事件历史记录器来动态地完成掩蔽。 中央系统可以周期性地轮询远程处理器以从本地计数器和设备控制寄存器接收准确的计数器值。 对历史信息进行条件概率计算时可以采取补救措施,表明将会发生重大事件。

    Method for using a priority queue to perform job scheduling on a cluster based on node rank and performance
    2.
    发明授权
    Method for using a priority queue to perform job scheduling on a cluster based on node rank and performance 有权
    基于节点等级和性能使用优先级队列对集群执行作业调度的方法

    公开(公告)号:US07827435B2

    公开(公告)日:2010-11-02

    申请号:US11057969

    申请日:2005-02-15

    IPC分类号: G06F11/00

    CPC分类号: G06F9/505 G06F2209/508

    摘要: In a multi node information processing system, a method for scheduling jobs, includes steps of: determining node-related performance parameters for a plurality of nodes; determining a ranking for each node based on the node related performance parameters for each node; and ordering each nodes by its ranking for job scheduling.

    摘要翻译: 在多节点信息处理系统中,调度作业的方法包括以下步骤:确定多个节点的节点相关性能参数; 基于每个节点的与节点相关的性能参数来确定每个节点的排名; 并通过其对作业调度的排名来排序每个节点。

    METHOD FOR ORGANIZING PROCESSES
    3.
    发明申请
    METHOD FOR ORGANIZING PROCESSES 审中-公开
    组织方法的方法

    公开(公告)号:US20100017244A1

    公开(公告)日:2010-01-21

    申请号:US12174130

    申请日:2008-07-16

    IPC分类号: G06Q10/00

    CPC分类号: G06Q10/06 G06Q10/06393

    摘要: Techniques for generating a target process are provided. The techniques include identifying at least one of one or more steps and one or more artifacts within a target process and one or more other processes, pre-fetching the at least one of one or more atomic steps, one or more decision steps and splits and one or more merges to be used in the target process from the one or more other processes, and associating the at least one of one or more atomic steps, one or more decision steps and splits and one or more merges to be used in the target process at one or more decision points to generate the target process.

    摘要翻译: 提供了用于生成目标过程的技术。 这些技术包括识别目标过程和一个或多个其他过程中的一个或多个步骤和一个或多个工件中的至少一个,预取一个或多个原子步骤中的至少一个,一个或多个决定步骤和分割,以及 在一个或多个其他过程中在目标过程中使用的一个或多个合并,以及将一个或多个原子步骤中的至少一个,一个或多个决策步骤和分割以及要在目标中使用的一个或多个合并 在一个或多个决策点处处理以产生目标过程。

    Scalable method of continuous monitoring the remotely accessible resources against the node failures for very large clusters
    5.
    发明授权
    Scalable method of continuous monitoring the remotely accessible resources against the node failures for very large clusters 有权
    可扩展的方法,连续监视远程可访问的资源,防止非常大的群集的节点故障

    公开(公告)号:US07296191B2

    公开(公告)日:2007-11-13

    申请号:US11456585

    申请日:2006-07-11

    IPC分类号: G06F11/00

    摘要: The notion of controlling, using and monitoring remote resources in a distributed data processing system through the use of proxy resource managers and agents is extended to provide failover capability so that resource coverage is preserved and maintained even in the event of either temporary or longer duration node failure. Mechanisms are provided for consistent determination of resource status. Mechanisms are also provided which facilitate the joining of nodes to a group of nodes while still preserving remote resource operations. Additional mechanisms are also provided for the return of remote resource management to the control of a previously failed, but now recovered node, even if the failure had resulted in a node reset.

    摘要翻译: 扩展了通过使用代理资源管理器和代理来控制,使用和监视分布式数据处理系统中的远程资源的概念,以提供故障转移功能,以便即使在临时或更长持续时间节点的情况下也可以保留和维护资源覆盖 失败。 为资源状况的一致确定提供了机制。 还提供了机制,其有助于将节点连接到一组节点,同时仍保留远程资源操作。 还提供了附加机制,用于将远程资源管理返回到先前故障但现在恢复的节点的控制,即使故障导致节点重置。

    Hybrid event prediction and system control
    6.
    发明授权
    Hybrid event prediction and system control 有权
    混合事件预测和系统控制

    公开(公告)号:US07895323B2

    公开(公告)日:2011-02-22

    申请号:US12267762

    申请日:2008-11-10

    IPC分类号: G06F15/173

    摘要: A system for predicting an occurrence of a critical even in a computer cluster includes: a control system that includes an event log, a system parameter log, a memory for storing information related to occurrences of critical events, and a processor. The processor implements a hybrid prediction system; loads the information from the event log and the system performance log into a Bayesian network model; uses the Bayesian network model to predict a future critical event; makes future scheduling and current data migration selections; and adapts the Bayesian network model by feeding the scheduling and data migration selections.

    摘要翻译: 一种用于预测计算机集群中关键事件的发生的系统包括:包括事件日志,系统参数日志,用于存储与关键事件发生有关的信息的存储器的处理器的控制系统。 处理器实现混合预测系统; 将事件日志和系统性能日志中的信息加载到贝叶斯网络模型中; 使用贝叶斯网络模型预测未来的关键事件; 使未来调度和当前数据迁移选择; 并通过馈送调度和数据迁移选择来适应贝叶斯网络模型。

    Hybrid method for event prediction and system control
    8.
    发明授权
    Hybrid method for event prediction and system control 失效
    用于事件预测和系统控制的混合方法

    公开(公告)号:US07451210B2

    公开(公告)日:2008-11-11

    申请号:US10720300

    申请日:2003-11-24

    IPC分类号: G06F15/173

    摘要: A hybrid method of predicting the occurrence of future critical events in a computer cluster having a series of nodes records system performance parameters and the occurrence of past critical events. A data filter filters the logged to data to eliminate redundancies and decrease the data storage requirements of the system. Time-series models and rule based classification schemes are used to associate various system parameters with the past occurrence of critical events and predict the occurrence of future critical events. Ongoing processing jobs are migrated to nodes for which no critical events are predicted and future jobs are routed to more robust nodes.

    摘要翻译: 在具有一系列节点的计算机集群中预测未来关键事件的发生的混合方法记录系统性能参数和过去关键事件的发生。 数据过滤器将记录到数据进行过滤,以消除冗余并减少系统的数据存储要求。 时间序列模型和基于规则的分类方案用于将各种系统参数与过去发生的关键事件相关联,并预测未来关键事件的发生。 正在进行的处理作业将迁移到不预测到关键事件的节点,并且将来的作业路由到更健壮的节点。

    Method and system for deciding when to checkpoint an application based on risk analysis
    9.
    发明授权
    Method and system for deciding when to checkpoint an application based on risk analysis 失效
    基于风险分析决定何时检查应用程序的方法和系统

    公开(公告)号:US07392433B2

    公开(公告)日:2008-06-24

    申请号:US11042611

    申请日:2005-01-25

    IPC分类号: G06F11/00

    CPC分类号: G06F11/1471

    摘要: Briefly, according to the invention in an information processing system including a plurality of information processing nodes, a request for checkpointing by an application includes node health criteria (or parameters). The system has the authority to grant or deny the checkpointing request depending on the system health or availability. This scheme significantly improves not only the system performance, but also the application running time as the system. By skipping a checkpoint the application can use the same time to run the application instead of spending extra time for checkpointing.

    摘要翻译: 简而言之,根据本发明,在包括多个信息处理节点的信息处理系统中,由应用程序检查点的请求包括节点健康标准(或参数)。 系统有权根据系统运行状况或可用性来授予或拒绝检查点请求。 该方案不仅显着提高了系统性能,而且显着提高了作为系统的应用运行时间。 通过跳过检查点,应用程序可以使用相同的时间运行应用程序,而不是花费额外的时间进行检查点。

    System and method for constructing flexible ordering to improve productivity and efficiency in process flows
    10.
    发明授权
    System and method for constructing flexible ordering to improve productivity and efficiency in process flows 有权
    用于构建灵活排序以提高生产率和工艺流程效率的系统和方法

    公开(公告)号:US08036865B2

    公开(公告)日:2011-10-11

    申请号:US12172573

    申请日:2008-07-14

    IPC分类号: G06G7/48

    CPC分类号: G06Q10/06

    摘要: A plurality of equivalent representations of a process are identified. The process has a plurality of tasks. Each of the representations specifies a different order of the tasks. The plurality of equivalent representations are consolidated into a single representation. The single representation captures, in at least one flexible order grouping, at least two of the tasks that may be performed in more than one order. At least one constraint is specified for the at least one flexible order grouping. Techniques for merging two or more flexible representations are also provided.

    摘要翻译: 识别过程的多个等效表示。 该过程具有多个任务。 每个表示都指定了不同的任务顺序。 多个等效表示被合并成单个表示。 单个表示在至少一个灵活的顺序分组中捕获可以以多于一个顺序执行的至少两个任务。 为至少一个灵活顺序分组指定至少一个约束。 还提供了用于合并两个或多个柔性表示的技术。