Predicting remaining useful life for a computer system using a stress-based prediction technique
    1.
    发明授权
    Predicting remaining useful life for a computer system using a stress-based prediction technique 有权
    使用基于压力的预测技术预测计算机系统的剩余使用寿命

    公开(公告)号:US08340923B2

    公开(公告)日:2012-12-25

    申请号:US12752767

    申请日:2010-04-01

    IPC分类号: G06F19/00

    CPC分类号: G06F11/008

    摘要: One embodiment of the present invention provides a system for predicting a remaining useful life (RUL) for a component in a set of components within a computer system. The system starts by collecting values of at least one degradation-related parameter associated with the operation of a monitored component within the computer system. Note that the degradation-related parameter is a direct measurement of a degree of degradation of the monitored component. The system additionally collects values of at least one stress-based parameter from the computer system. Note that the stress-based parameter measures an accumulative stress in the operating environment of the set of components which can cause degradation of the set of components. The system then uses the values of the at least one degradation-related parameter and the values of the at least one stress-based parameter to predict an RUL for a component in the set of components.

    摘要翻译: 本发明的一个实施例提供了一种用于预测计算机系统内的一组组件中的组件的剩余使用寿命(RUL)的系统。 该系统通过收集与计算机系统内的被监测组件的操作相关联的至少一个退化相关参数的值来开始。 请注意,降解相关参数是对被监测组分的降解程度的直接测量。 该系统还从计算机系统收集至少一个基于应力的参数的值。 注意,基于应力的参数测量组件组的操作环境中的累积应力,这可能导致该组件的劣化。 然后,系统使用至少一个退化相关参数的值和至少一个基于应力的参数的值来预测组件组中的组件的RUL。

    PREDICTING REMAINING USEFUL LIFE FOR A COMPUTER SYSTEM USING A STRESS-BASED PREDICTION TECHNIQUE
    2.
    发明申请
    PREDICTING REMAINING USEFUL LIFE FOR A COMPUTER SYSTEM USING A STRESS-BASED PREDICTION TECHNIQUE 有权
    使用基于应力的预测技术预测计算机系统的有用寿命

    公开(公告)号:US20110246093A1

    公开(公告)日:2011-10-06

    申请号:US12752767

    申请日:2010-04-01

    IPC分类号: G06F19/00 G06F15/00

    CPC分类号: G06F11/008

    摘要: One embodiment of the present invention provides a system for predicting a remaining useful life (RUL) for a component in a set of components within a computer system. The system starts by collecting values of at least one degradation-related parameter associated with the operation of a monitored component within the computer system. Note that the degradation-related parameter is a direct measurement of a degree of degradation of the monitored component. The system additionally collects values of at least one stress-based parameter from the computer system. Note that the stress-based parameter measures an accumulative stress in the operating environment of the set of components which can cause degradation of the set of components. The system then uses the values of the at least one degradation-related parameter and the values of the at least one stress-based parameter to predict an RUL for a component in the set of components.

    摘要翻译: 本发明的一个实施例提供一种用于预测计算机系统内的一组组件中的组件的剩余使用寿命(RUL)的系统。 该系统通过收集与计算机系统内的被监测组件的操作相关联的至少一个退化相关参数的值来开始。 请注意,降解相关参数是对被监测组分的降解程度的直接测量。 该系统还从计算机系统收集至少一个基于应力的参数的值。 注意,基于应力的参数测量组件组的操作环境中的累积应力,这可能导致该组件的劣化。 然后,系统使用至少一个退化相关参数的值和至少一个基于应力的参数的值来预测组件组中的组件的RUL。

    Enhancing throughput and fault-tolerance in a parallel-processing system
    3.
    发明授权
    Enhancing throughput and fault-tolerance in a parallel-processing system 有权
    提高并行处理系统的吞吐量和容错能力

    公开(公告)号:US07543180B2

    公开(公告)日:2009-06-02

    申请号:US11371998

    申请日:2006-03-08

    IPC分类号: G06F11/00

    摘要: One embodiment of the present invention provides a system that enhances throughput and fault-tolerance in a parallel-processing system. During operation, the system first receives a task. Next, the system partitions N computing nodes into M set-aside nodes and N-M primary computing nodes, wherein M≧1. The system then processes the task in parallel across the N-M primary computing nodes. While doing so, the system proactively monitors the health of each of the N-M primary computing nodes. If the system detects a node in the N-M primary computing nodes to be at risk of failure, the system copies the portion of the task associated with the at-risk node to a subset of the M set-aside nodes. The system then processes the portion of the task in parallel across the subset of the M set-aside nodes while the N-M primary computing nodes continue executing.

    摘要翻译: 本发明的一个实施例提供一种提高并行处理系统中的吞吐量和容错能力的系统。 在操作过程中,系统首先接收到一个任务。 接下来,系统将N个计算节点划分为M个置换节点和N-M个主要计算节点,其中M> = 1。 然后,系统在N-M主计算节点上并行处理任务。 在这样做的同时,系统主动监控每个N-M主计算节点的运行状况。 如果系统检测到N-M主计算节点中的节点处于故障风险,则系统将与风险中节点相关联的任务的一部分复制到M个备用节点的子集。 然后,在N-M主计算节点继续执行的同时,系统跨M个备用节点的子集并行地处理任务的该部分。