Textual explanations for abstract syntax trees with scored nodes

    公开(公告)号:US12260306B2

    公开(公告)日:2025-03-25

    申请号:US17891350

    申请日:2022-08-19

    Abstract: Herein is a machine learning (ML) explainability (MLX) approach in which a natural language explanation is generated based on analysis of a parse tree such as for a suspicious database query or web browser JavaScript. In an embodiment, a computer selects, based on a respective relevance score for each non-leaf node in a parse tree of a statement, a relevant subset of non-leaf nodes. The non-leaf nodes are grouped in the parse tree into groups that represent respective portions of the statement. Based on a relevant subset of the groups that contain at least one non-leaf node in the relevant subset of non-leaf nodes, a natural language explanation of why the statement is anomalous is generated.

    EFFICIENT DATA DISTRIBUTION PRESERVING TRAINING PARADIGM

    公开(公告)号:US20240419943A1

    公开(公告)日:2024-12-19

    申请号:US18209024

    申请日:2023-06-13

    Abstract: A computer performs deduplication of an original training corpus for maintaining accuracy of accelerated training of a reconstructive or other machine learning (ML) model. Distinct multidimensional points are detected in the original training corpus that contains duplicates. Based on duplicates in the original training corpus, a respective observed frequency of each distinct multidimensional point is increased. In a reconstructive embodiment and based on a particular distinct multidimensional point as input, a reconstruction of the particular distinct multidimensional point is generated by a reconstructive ML model. Based on increasing the observed frequency of the particular distinct multidimensional point, a scaled error of the reconstruction of the particular distinct multidimensional point is increased. Based on the scaled error of the reconstruction of the particular distinct multidimensional point, accuracy of the reconstructive model is increased. In an embodiment, the reconstructive ML model is an artificial neural network that is a denoising autoencoder that detects anomalous database statements.

    Extraction from trees at scale
    36.
    发明授权

    公开(公告)号:US11620118B2

    公开(公告)日:2023-04-04

    申请号:US17175250

    申请日:2021-02-12

    Abstract: Herein are machine learning (ML) feature processing and analytic techniques to detect anomalies in parse trees of logic statements, database queries, logic scripts, compilation units of general-purpose programing language, extensible markup language (XML), JavaScript object notation (JSON), and document object models (DOM). In an embodiment, a computer identifies an operational trace that contains multiple parse trees. Values of explicit features are generated from a single respective parse tree of the multiple parse trees of the operational trace. Values of implicit features are generated from more than one respective parse tree of the multiple parse trees of the operational trace. The explicit and implicit features are stored into a same feature vector. With the feature vector as input, an ML model detects whether or not the operational trace is anomalous, based on the explicit features of each parse tree of the operational trace and the implicit features of multiple parse trees of the operational trace.

    Kernel subsampling for an accelerated tree similarity computation

    公开(公告)号:US11449517B2

    公开(公告)日:2022-09-20

    申请号:US17131299

    申请日:2020-12-22

    Abstract: Approaches herein relate to machine learning for detection of anomalous logic syntax. Herein is acceleration for comparison of parse trees such as suspicious database queries. In an embodiment, a computer identifies subtrees in each of many trees. A respective subset of participating subtrees is selected in each tree. A respective root node of each participating subtree should directly have a child node that is a leaf and/or should have a degree that exceeds a branching threshold such as one. For each pairing of a respective first tree with a respective second tree, based on a count of subtree matches between the participating subset of subtrees in the first tree and the participating subset of subtrees in the second tree, a respective tree similarity score is calculated. A machine learning model inferences based on the tree similarity scores of the many trees. In an embodiment, each tree similarity score is a convolution kernel.

    EXTRACTION FROM TREES AT SCALE
    38.
    发明申请

    公开(公告)号:US20220261228A1

    公开(公告)日:2022-08-18

    申请号:US17175250

    申请日:2021-02-12

    Abstract: Herein are machine learning (ML) feature processing and analytic techniques to detect anomalies in parse trees of logic statements, database queries, logic scripts, compilation units of general-purpose programing language, extensible markup language (XML), JavaScript object notation (JSON), and document object models (DOM). In an embodiment, a computer identifies an operational trace that contains multiple parse trees. Values of explicit features are generated from a single respective parse tree of the multiple parse trees of the operational trace. Values of implicit features are generated from more than one respective parse tree of the multiple parse trees of the operational trace. The explicit and implicit features are stored into a same feature vector. With the feature vector as input, an ML model detects whether or not the operational trace is anomalous, based on the explicit features of each parse tree of the operational trace and the implicit features of multiple parse trees of the operational trace.

    Detecting device utilization imbalances

    公开(公告)号:US11036561B2

    公开(公告)日:2021-06-15

    申请号:US16044230

    申请日:2018-07-24

    Abstract: Embodiments monitor statistics from groups of devices and generate an alarm upon detecting a utilization imbalance that is beyond a threshold. Particular balance statistics are periodically sampled, over a timeframe, for a group of devices configured to have balanced utilization. The devices are ranked at every data collection timestamp based on the gathered device statistics. The numbers of times each device appears within each rank over the timeframe are tallied. The device/rank summations are collectively used as a probability distribution representing the probability of each device being ranked at each of the rankings in the future. Based on this probability distribution, an entropy value that represents a summary of the imbalance of the group of devices over the timeframe is derived. An imbalance alert is generated when one or more entropy values for a group of devices shows an imbalanced utilization of the devices going beyond an identified imbalance threshold.

Patent Agency Ranking