PROFILE-ENRICHED EXPLANATIONS OF DATA-DRIVEN MODELS

    公开(公告)号:US20240126798A1

    公开(公告)日:2024-04-18

    申请号:US18203195

    申请日:2023-05-30

    CPC classification number: G06F16/345 G06F16/335 G06F40/186

    Abstract: In an embodiment, a computer stores, in memory or storage, many explanation profiles, many log entries, and definitions of many features that log entries contain. Some features may contain a logic statement such as a database query, and these are specially aggregated based on similarity. Based on the entity specified by an explanation profile, statistics are materialized for some or all features. Statistics calculation may be based on scheduled batches of log entries or a stream of live log entries. At runtime, an inference that is based on a new log entry is received. Based on an entity specified in the new log entry, a particular explanation profile is dynamically selected. Based on the new log entry and statistics of features for the selected explanation profile, a local explanation of the inference is generated. In an embodiment, an explanation text template is used to generate the local explanation.

    SCORE PROPAGATION ON GRAPHS WITH DIFFERENT SUBGRAPH MAPPING STRATEGIES

    公开(公告)号:US20240070156A1

    公开(公告)日:2024-02-29

    申请号:US17893519

    申请日:2022-08-23

    CPC classification number: G06F16/24575

    Abstract: Techniques for propagating scores in subgraphs are provided. In one technique, multiple path scores are stored, each path score associated with a path (or subgraph), of multiple paths, in a graph of nodes. The path scores may be generated by a machine-learned model. For each path score, a path that is associated with that path score is identified and nodes of that path are identified. For each identified node, a node score for that node is determined or computed based on the corresponding path score and the node score is stored in association with that node. Subsequently, for each node in a subset of the graph, multiple node scores that are associated with that node are identified and aggregated to generate a propagated score for that node. In a related technique, a propagated score of a node is used to compute a score for each leaf node of the node.

    TEXTUAL EXPLANATIONS FOR ABSTRACT SYNTAX TREES WITH SCORED NODES

    公开(公告)号:US20240061997A1

    公开(公告)日:2024-02-22

    申请号:US17891350

    申请日:2022-08-19

    CPC classification number: G06F40/205 G06N20/00

    Abstract: Herein is a machine learning (ML) explainability (MLX) approach in which a natural language explanation is generated based on analysis of a parse tree such as for a suspicious database query or web browser JavaScript. In an embodiment, a computer selects, based on a respective relevance score for each non-leaf node in a parse tree of a statement, a relevant subset of non-leaf nodes. The non-leaf nodes are grouped in the parse tree into groups that represent respective portions of the statement. Based on a relevant subset of the groups that contain at least one non-leaf node in the relevant subset of non-leaf nodes, a natural language explanation of why the statement is anomalous is generated.

    TRACE REPRESENTATION LEARNING
    24.
    发明公开

    公开(公告)号:US20230376743A1

    公开(公告)日:2023-11-23

    申请号:US17748226

    申请日:2022-05-19

    CPC classification number: G06N3/08 G06N3/088 G06N20/00

    Abstract: The present invention avoids overfitting in deep neural network (DNN) training by using multitask learning (MTL) and self-supervised learning (SSL) techniques when training a multi-branch DNN to encode a sequence. In an embodiment, a computer first trains the DNN to perform a first task. The DNN contains: a first encoder in a first branch, a second encoder in a second branch, and an interpreter layer that combines data from the first branch and the second branch. The DNN second trains to perform a second task. After the first and second trainings, production encoding and inferencing occur. The first encoder encodes a sparse feature vector into a dense feature vector from which an inference is inferred. In an embodiment, a sequence of log messages is encoded into an encoded trace. An anomaly detector infers whether the sequence is anomalous. In an embodiment, the log messages are database commands.

    SEMI-SUPERVISED FRAMEWORK FOR PURPOSE-ORIENTED ANOMALY DETECTION

    公开(公告)号:US20230362180A1

    公开(公告)日:2023-11-09

    申请号:US17739968

    申请日:2022-05-09

    CPC classification number: H04L63/1425 G06N20/20

    Abstract: Techniques for implementing a semi-supervised framework for purpose-oriented anomaly detection are provided. In one technique, a data item in inputted into an unsupervised anomaly detection model, which generates first output. Based on the first output, it is determined whether the data item represents an anomaly. In response to determining that the data item represents an anomaly, the data item is inputted into a supervised classification model, which generates second output that indicates whether the data item is unknown. In response to determining that the data item is unknown, a training instance is generated based on the data item. The supervised classification model is updated based on the training instance.

    Datacenter level utilization prediction without operating system involvement

    公开(公告)号:US11657256B2

    公开(公告)日:2023-05-23

    申请号:US17867552

    申请日:2022-07-18

    Abstract: Embodiments use a hierarchy of machine learning models to predict datacenter behavior at multiple hardware levels of a datacenter without accessing operating system generated hardware utilization information. The accuracy of higher-level models in the hierarchy of models is increased by including, as input to the higher-level models, hardware utilization predictions from lower-level models. The hierarchy of models includes: server utilization models and workload/OS prediction models that produce predictions at a server device-level of a datacenter; and also top-of-rack switch models and backbone switch models that produce predictions at higher levels of the datacenter. These models receive, as input, hardware utilization information from non-OS sources. Based on datacenter-level network utilization predictions from the hierarchy of models, the datacenter automatically configures its hardware to avoid any predicted over-utilization of hardware in the datacenter. Also, the predictions from the hierarchy of models can be used to detect anomalies of datacenter hardware behavior.

    Disk drive failure prediction with neural networks

    公开(公告)号:US11579951B2

    公开(公告)日:2023-02-14

    申请号:US16144912

    申请日:2018-09-27

    Abstract: Techniques are described herein for predicting disk drive failure using a machine learning model. The framework involves receiving disk drive sensor attributes as training data, preprocessing the training data to select a set of enhanced feature sequences, and using the enhanced feature sequences to train a machine learning model to predict disk drive failures from disk drive sensor monitoring data. Prior to the training phase, the RNN LSTM model is tuned using a set of predefined hyper-parameters. The preprocessing, which is performed during the training and evaluation phase as well as later during the prediction phase, involves using predefined values for a set of parameters to generate the set of enhanced sequences from raw sensor reading. The enhanced feature sequences are generated to maintain a desired healthy/failed disk ratio, and only use samples leading up to a last-valid-time sample in order to honor a pre-specified heads-up-period alert requirement.

    APPLICATION- AND INFRASTRUCTURE-AWARE ORCHESTRATION FOR CLOUD MONITORING APPLICATIONS

    公开(公告)号:US20200259722A1

    公开(公告)日:2020-08-13

    申请号:US16271535

    申请日:2019-02-08

    Abstract: Herein are computerized techniques for autonomous and artificially intelligent administration of a computer cloud health monitoring system. In an embodiment, an orchestration computer automatically detects a current state of network elements of a computer network by processing: a) a network plan that defines a topology of the computer network, and b) performance statistics of the network elements. The network elements include computers that each hosts virtual execution environment(s). Each virtual execution environment hosts analysis logic that transforms raw performance data of a network element into a portion of the performance statistics. For each computer, a configuration specification for each virtual execution environment of the computer is automatically generated based on the network plan and the current state of the computer network. At least one virtual execution environment is automatically tuned and/or re-provisioned based on a generated configuration specification.

Patent Agency Ranking