Fault identification of multi-host complex systems with timesliding window analysis in a time series
    1.
    发明授权
    Fault identification of multi-host complex systems with timesliding window analysis in a time series 有权
    多时间复杂系统的故障识别与时间序列窗口分析

    公开(公告)号:US08069370B1

    公开(公告)日:2011-11-29

    申请号:US12830069

    申请日:2010-07-02

    IPC分类号: G06F11/00

    摘要: A method and apparatus is provided for determining the most probable cause of a problem observed in a complex multi-host system. The approach relies on a probabilistic model to represent causes and effects in a complex computing system. However, complex systems include a multitude of independently operating components that can cause temporary anomalous states. To reduce the resources required to perform root cause analysis on each transient failure, as well as to raise the confidence in the most probable cause of a failure that is identified by the model, inputs to the probabilistic model are aggregated over a sliding window of values from the recent past.

    摘要翻译: 提供了一种用于确定在复杂的多主机系统中观察到的问题的最可能原因的方法和装置。 该方法依赖于概率模型来表示复杂计算系统中的原因和影响。 然而,复杂的系统包括许多可以引起临时异常状态的独立运行的组件。 为了减少对每个瞬态故障执行根本原因分析所需的资源,以及提高对由模型识别的故障的最可能原因的置信度,对概率模型的输入在值的滑动窗口上聚合 从最近过去。

    METHOD AND APPARATUS FOR DETERMINING RANKED CAUSAL PATHS FOR FAULTS IN A COMPLEX MULTI-HOST SYSTEM WITH PROBABILISTIC INFERENCE IN A TIME SERIES
    2.
    发明申请
    METHOD AND APPARATUS FOR DETERMINING RANKED CAUSAL PATHS FOR FAULTS IN A COMPLEX MULTI-HOST SYSTEM WITH PROBABILISTIC INFERENCE IN A TIME SERIES 有权
    用于确定具有时间序列中的概率干扰的复杂多重系统中的故障的排序原因的方法和装置

    公开(公告)号:US20120005532A1

    公开(公告)日:2012-01-05

    申请号:US12830116

    申请日:2010-07-02

    申请人: Fulu Li Mohsin Beg

    发明人: Fulu Li Mohsin Beg

    IPC分类号: G06F11/07

    CPC分类号: G06F11/079 G06F11/0709

    摘要: A method and apparatus are provided for determining that problems have occurred within a complex multi-host system and for identifying for each problem, sequences of causes and effects called a fault cause path, starting with a root cause. A probabilistic model representing the cause/effect relationships among potential system problems identifies the probability that a problem occurred in the system. Such failure probabilities may be determined based on aggregating, over a recent time interval, probability of failure values determined by the probabilistic model. Each fault cause path may have an associated probability of accuracy value reflecting the expected accuracy of the fault cause path relative to other fault cause paths. When more than one fault cause path is identified, the number and order of the fault cause paths may be ranked and displayed based on their probability of accuracy value.

    摘要翻译: 提供了一种方法和装置,用于确定在复杂的多主机系统内发生问题,并且为了识别每个问题,从根本原因开始,称为故障原因路径的原因和效应的顺序。 表示潜在系统问题之间的因果关系的概率模型识别系统中发生问题的可能性。 可以基于在最近的时间间隔聚集由概率模型确定的故障值的概率来确定这种故障概率。 每个故障原因路径可能具有相关的故障原因路径相对于其他故障原因路径的预期精度的精度值的相关概率。 当识别到多个故障原因路径时,可以根据其准确度值的概率对故障引起路径的数量和顺序进行排序和显示。

    Methods and apparatus for cross-host diagnosis of complex multi-host systems in a time series with probabilistic inference
    3.
    发明授权
    Methods and apparatus for cross-host diagnosis of complex multi-host systems in a time series with probabilistic inference 有权
    用于概率推理的时间序列中复杂多主机系统的跨主机诊断的方法和装置

    公开(公告)号:US08291263B2

    公开(公告)日:2012-10-16

    申请号:US12830144

    申请日:2010-07-02

    申请人: Fulu Li Mohsin Beg

    发明人: Fulu Li Mohsin Beg

    IPC分类号: G06F11/00

    摘要: A method and apparatus are provided for performing cross-host root cause diagnosis within a complex multi-host environment. In a multi-host environment, sometimes system failures on one host may cause problems at another host within the same environment. A probabilistic model is used to represent failures that can occur within each host in the environment. The cause and effect relationships among these failures together with measurement values are used to generate a probability that each potential failure occurred in each host. When a problem is observed on one host without detecting a corresponding root cause within the same host, a cross-host failure diagnosis is performed. The probabilistic models for other hosts in the environment are used to determine the most likely cause of the failure.

    摘要翻译: 提供了一种在复杂的多主机环境内执行跨主机根本原因诊断的方法和装置。 在多主机环境中,有时一台主机上的系统故障可能会在同一环境中的另一台主机上造成问题。 概率模型用于表示可能在环境中的每个主机内发生的故障。 这些故障之间的因果关系与测量值一起用于产生每个潜在故障发生在每个主机中的概率。 当在一个主机上观察到问题而不检测相同主机内的相应根本原因时,执行跨主机故障诊断。 环境中其他主机的概率模型用于确定故障的最可能原因。

    Method and apparatus for dealing with accumulative behavior of some system observations in a time series for Bayesian inference with a static Bayesian network model
    4.
    发明授权
    Method and apparatus for dealing with accumulative behavior of some system observations in a time series for Bayesian inference with a static Bayesian network model 有权
    用于贝叶斯推理的时间序列中的一些系统观测的累积行为的静态贝叶斯网络模型的方法和装置

    公开(公告)号:US08230262B2

    公开(公告)日:2012-07-24

    申请号:US12830175

    申请日:2010-07-02

    申请人: Fulu Li Mohsin Beg

    发明人: Fulu Li Mohsin Beg

    IPC分类号: G06F11/00

    摘要: A method and apparatus are provided for determining the probability that one or more problems have occurred within a complex multi-host system. A probabilistic model representing the cause/effect relationships among potential system problems identifies the probability that a problem occurred in the system based at least on system measure states that are input into the probabilistic model. System measure states may be determined based on an aggregation of system measurement values taken periodically. Aggregating system measurement values may be performed over system measurement values that were taken during a recent time interval. A rolling count aggregation function may be used for this purpose. A rolling count function counts the number of system measurement values taken within the recent time interval that lie within a particular range of values. A system measure state may be determined based on whether the rolling count exceeds a threshold associated with the system measure.

    摘要翻译: 提供了一种用于确定复杂多主机系统内发生一个或多个问题的可能性的方法和装置。 表示潜在系统问题之间的因果关系的概率模型识别系统中至少基于输入到概率模型的系统测量状态发生问题的概率。 可以基于周期性地进行的系统测量值的聚合来确定系统测量状态。 可以在最近的时间间隔内进行的系统测量值执行聚合系统测量值。 滚动计数聚合功能可用于此目的。 滚动计数功能计算在最近的时间间隔内在特定值范围内所采用的系统测量值的数量。 可以基于滚动计数是否超过与系统度量相关联的阈值来确定系统测量状态。

    Method and apparatus for determining ranked causal paths for faults in a complex multi-host system with probabilistic inference in a time series
    5.
    发明授权
    Method and apparatus for determining ranked causal paths for faults in a complex multi-host system with probabilistic inference in a time series 有权
    用于确定具有时间序列的概率推理的复杂多主机系统中的故障的排序因果路径的方法和装置

    公开(公告)号:US08156377B2

    公开(公告)日:2012-04-10

    申请号:US12830116

    申请日:2010-07-02

    申请人: Fulu Li Mohsin Beg

    发明人: Fulu Li Mohsin Beg

    IPC分类号: G06F11/00

    CPC分类号: G06F11/079 G06F11/0709

    摘要: A method and apparatus are provided for determining that problems have occurred within a complex multi-host system and for identifying for each problem, sequences of causes and effects called a fault cause path, starting with a root cause. A probabilistic model representing the cause/effect relationships among potential system problems identifies the probability that a problem occurred in the system. Such failure probabilities may be determined based on aggregating, over a recent time interval, probability of failure values determined by the probabilistic model. Each fault cause path may have an associated probability of accuracy value reflecting the expected accuracy of the fault cause path relative to other fault cause paths. When more than one fault cause path is identified, the number and order of the fault cause paths may be ranked and displayed based on their probability of accuracy value.

    摘要翻译: 提供了一种方法和装置,用于确定在复杂的多主机系统内发生问题,并且为了识别每个问题,从根本原因开始,称为故障原因路径的原因和效应的顺序。 表示潜在系统问题之间的因果关系的概率模型识别系统中发生问题的可能性。 可以基于在最近的时间间隔聚集由概率模型确定的故障值的概率来确定这种故障概率。 每个故障原因路径可能具有相关的故障原因路径相对于其他故障原因路径的预期精度的精度值的相关概率。 当识别到多个故障原因路径时,可以根据其准确度值的概率对故障引起路径的数量和顺序进行排序和显示。

    METHOD AND APPARATUS FOR DEALING WITH ACCUMULATIVE BEHAVIOR OF SOME SYSTEM OBSERVATIONS IN A TIME SERIES FOR BAYESIAN INFERENCE WITH A STATIC BAYESIAN NETWORK MODEL
    6.
    发明申请
    METHOD AND APPARATUS FOR DEALING WITH ACCUMULATIVE BEHAVIOR OF SOME SYSTEM OBSERVATIONS IN A TIME SERIES FOR BAYESIAN INFERENCE WITH A STATIC BAYESIAN NETWORK MODEL 有权
    用静态贝叶斯网络模型处理贝叶斯干扰的时间序列中某些系统观测的累积行为的方法和装置

    公开(公告)号:US20120005534A1

    公开(公告)日:2012-01-05

    申请号:US12830175

    申请日:2010-07-02

    申请人: Fulu Li Mohsin Beg

    发明人: Fulu Li Mohsin Beg

    IPC分类号: G06F11/00

    摘要: A method and apparatus are provided for determining the probability that one or more problems have occurred within a complex multi-host system. A probabilistic model representing the cause/effect relationships among potential system problems identifies the probability that a problem occurred in the system based at least on system measure states that are input into the probabilistic model. System measure states may be determined based on an aggregation of system measurement values taken periodically. Aggregating system measurement values may be performed over system measurement values that were taken during a recent time interval. A rolling count aggregation function may be used for this purpose. A rolling count function counts the number of system measurement values taken within the recent time interval that lie within a particular range of values. A system measure state may be determined based on whether the rolling count exceeds a threshold associated with the system measure.

    摘要翻译: 提供了一种用于确定复杂多主机系统内发生一个或多个问题的可能性的方法和装置。 表示潜在系统问题之间的因果关系的概率模型识别系统中至少基于输入到概率模型的系统测量状态发生问题的概率。 可以基于周期性地进行的系统测量值的聚合来确定系统测量状态。 可以在最近的时间间隔内进行的系统测量值执行聚合系统测量值。 滚动计数聚合功能可用于此目的。 滚动计数功能计算在最近的时间间隔内在特定值范围内所采用的系统测量值的数量。 可以基于滚动计数是否超过与系统度量相关联的阈值来确定系统测量状态。

    Methods And Apparatus For Cross-Host Diagnosis Of Complex Multi-Host Systems In A Time Series With Probablistic Inference
    7.
    发明申请
    Methods And Apparatus For Cross-Host Diagnosis Of Complex Multi-Host Systems In A Time Series With Probablistic Inference 有权
    用于概率推理的时间序列中复杂多主机系统的跨主机诊断的方法和装置

    公开(公告)号:US20120005533A1

    公开(公告)日:2012-01-05

    申请号:US12830144

    申请日:2010-07-02

    申请人: Fulu Li Mohsin Beg

    发明人: Fulu Li Mohsin Beg

    IPC分类号: G06F11/07

    摘要: A method and apparatus are provided for performing cross-host root cause diagnosis within a complex multi-host environment. In a multi-host environment, sometimes system failures on one host may cause problems at another host within the same environment. A probabilistic model is used to represent failures that can occur within each host in the environment. The cause and effect relationships among these failures together with measurement values are used to generate a probability that each potential failure occurred in each host. When a problem is observed on one host without detecting a corresponding root cause within the same host, a cross-host failure diagnosis is performed. The probabilistic models for other hosts in the environment are used to determine the most likely cause of the failure.

    摘要翻译: 提供了一种在复杂的多主机环境内执行跨主机根本原因诊断的方法和装置。 在多主机环境中,有时一台主机上的系统故障可能会在同一环境中的另一台主机上造成问题。 概率模型用于表示可能在环境中的每个主机内发生的故障。 这些故障之间的因果关系与测量值一起用于产生每个潜在故障发生在每个主机中的概率。 当在一个主机上观察到问题而不检测相同主机内的相应根本原因时,执行跨主机故障诊断。 环境中其他主机的概率模型用于确定故障的最可能原因。

    Failover and resume when using ordered sequences in a multi-instance database environment

    公开(公告)号:US09910893B2

    公开(公告)日:2018-03-06

    申请号:US13309300

    申请日:2011-12-01

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/3048

    摘要: An approach is disclosed for implementing failover and resume when using ordered sequences in a multi-instance database environment. The approach commences by instantiating a first database instance initially to serve as an active instance, then instantiating a second database instance to serve as an instance of one or more passive instances. The active database establishes mastership over a sequence and then processes requests for the ‘next’ symbol by accessing a shared sequence cache only after accessing a first instance semaphore. The active instance and the passive instance perform a protocol such that upon passive database detection of a failure of the active database, one of the passive database instances takes over mastership of the sequence cache, and then proceeds to satisfy sequence value requests. The particular order is observed in spite of the failure.

    Generating an ordered sequence in a database system using multiple interleaved caches
    9.
    发明授权
    Generating an ordered sequence in a database system using multiple interleaved caches 有权
    使用多个交错缓存在数据库系统中生成有序序列

    公开(公告)号:US09189295B2

    公开(公告)日:2015-11-17

    申请号:US13309356

    申请日:2011-12-01

    IPC分类号: G06F9/52 G06F17/30

    摘要: A method, system, and computer program product is disclosed for generating an ordered sequence from a predetermined sequence of symbols using protected interleaved caches, such as semaphore protected interleaved caches. The approach commences by dividing the predetermined sequence of symbols into two or more interleaved caches, then mapping each of the two or more interleaved caches to a particular semaphore of a group of semaphores. The group of semaphores is organized into bytes or machine words for storing the group of semaphores into a shared memory, the shared memory accessible by a plurality of session processes. Protected (serialized) access by the session processes is provided by granting access to one of the two or more interleaved caches only after one of the plurality of session processes performs a semaphore altering read-modify-write operation (e.g., a CAS) on the particular semaphore. The interleaved caches are assigned values successively from the predetermined sequence using a round-robin assignment technique.

    摘要翻译: 公开了一种方法,系统和计算机程序产品,用于使用受保护的交错高速缓存(例如信号量保护的交错高速缓存)从预定的符号序列生成有序序列。 该方法通过将预定的符号序列划分成两个或多个交织的高速缓存,然后将两个或多个交织的高速缓存中的每一个映射到一组信号量的特定信号量来开始。 信号组被组织成字节或机器字,用于将信号组存储到共享存储器中,共享存储器可由多个会话进程访问。 只有在多个会话进程中的一个会话处理执行信号量改变读取 - 修改 - 写入操作(例如CAS)之后,才允许访问两个或多个交织高速缓存中的一个的缓存 特别的信号量。 使用循环分配技术,从预定序列连续地分配交织的高速缓存。

    Reducing sequence cache latch contention in a database system
    10.
    发明授权
    Reducing sequence cache latch contention in a database system 有权
    减少数据库系统中的序列缓存锁定争用

    公开(公告)号:US09141609B2

    公开(公告)日:2015-09-22

    申请号:US13309394

    申请日:2011-12-01

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30 G06F17/30348

    摘要: In a database system having a plurality of concurrently executing session processes, the method commences by establishing a master list of sequences, the master list comprising a plurality of sequence objects which in turn define a sequence of values used for numbering and other identification within the database system. To reduce sequence cache latch access contention, multiple tiers of latches are provided. Methods of the system provide a first tier having a first tier “global” latch to serialize access to the master list. A second tier of latches is provided, the second tier having multiple second tier latches to serialize access to corresponding allocated sequences of values such that at any point in time, only one of the concurrently executing session processes is granted access to the allocated sequence.

    摘要翻译: 在具有多个同时执行的会话处理的数据库系统中,该方法通过建立序列的主列表开始,主列表包括多个序列对象,这些序列对象又定义用于数据库中的编号和其他识别的值序列 系统。 为了减少序列高速缓存锁存器访问争用,提供多层锁存器。 系统的方法提供具有第一层“全局”锁存器的第一层以串行化对主列表的访问。 提供了第二层锁存器,第二层具有多个第二层锁存器,以串行化对相应分配的值序列的访问,使得在任何时间点,只允许一个并发执行的会话进程访问所分配的序列。