Patent search ap:("Oracle International Corporation") AND inv:"Felix Schmidt" Page 2

11.

发明申请
One-Hot Encoder Using Lazy Evaluation Of Relational Statements 有权

公开(公告)号：US20250077519A1

公开(公告)日：2025-03-06

申请号：US18955689

申请日：2024-11-21

Applicant: Oracle International Corporation

Inventor： Felix Schmidt , Matteo Casserini , Milos Vasic , Marija Nikolic

IPC: G06F16/2453 , G06F16/2458

Abstract: A method and one or more non-transitory storage media are provided to train and implement a one-hot encoder. During a training phase, computation of an encoder state is performed by executing a set of relational statements to extract unique categories in a first training data set, associate each unique category with a unique index, and generate a one-hot encoding for each unique category. The set of relational statements are executed by a query optimization engine. Execution of the set of relational statements is postponed until a result of each relational statement is needed, and the query optimization engine implements one or more optimizations when executing the set of relational statements. During an encoding phase, a set of categorical features in a second training data set are encoded based on the encoder state to form a set of encoded categorical features.

12.

发明公开
GENERALIZED PRODUCTION RULES - N-GRAM FEATURE EXTRACTION FROM ABSTRACT SYNTAX TREES (AST) FOR CODE VECTORIZATION 审中-公开

公开(公告)号：US20240311660A1

公开(公告)日：2024-09-19

申请号：US18671645

申请日：2024-05-22

Applicant: Oracle International Corporation

Inventor： Arno Schneuwly , Nikola Milojkovic , Felix Schmidt , Nipun Agarwal

IPC: G06N5/04 , G06F8/41 , G06F16/242 , G06F16/2455 , G06N5/025

CPC classification number: G06N5/04 , G06F8/427 , G06F8/43 , G06F16/2433 , G06F16/24564 , G06N5/025

Abstract: Herein is resource-constrained feature enrichment for analysis of parse trees such as suspicious database queries. In an embodiment, a computer receives a parse tree that contains many tree nodes. Each tree node is associated with a respective production rule that was used to generate the tree node. Extracted from the parse tree are many sequences of production rules having respective sequence lengths that satisfy a length constraint that accepts at least one fixed length that is greater than two. Each extracted sequence of production rules consists of respective production rules of a sequence of tree nodes in a respective directed tree path of the parse tree having a path length that satisfies that same length constraint. Based on the extracted sequences of production rules, a machine learning model generates an inference. In a bag of rules data structure, the extracted sequences of production rules are aggregated by distinct sequence and duplicates are counted.

13.

发明公开
VALIDATION METRIC FOR ATTRIBUTION-BASED EXPLANATION METHODS FOR ANOMALY DETECTION MODELS 审中-公开

公开(公告)号：US20240037383A1

公开(公告)日：2024-02-01

申请号：US17873482

申请日：2022-07-26

Applicant: Oracle International Corporation

Inventor： Kenyu Kobayashi , Arno Schneuwly , Renata Khasanova , Matteo Casserini , Felix Schmidt

IPC: G06N3/08

CPC classification number: G06N3/08

Abstract: Herein are machine learning (ML) explainability (MLX) techniques for calculating and using a novel fidelity metric for assessing and comparing explainers that are based on feature attribution. In an embodiment, a computer generates many anomalous tuples from many non-anomalous tuples. Each anomalous tuple contains a perturbed value of a respective perturbed feature. For each anomalous tuple, a respective explanation is generated that identifies a respective identified feature as a cause of the anomalous tuple being anomalous. A fidelity metric is calculated by counting correct explanations for the anomalous tuples whose identified feature is the perturbed feature. Tuples may represent entries in an activity log such as structured query language (SQL) statements in a console output log of a database server. This approach herein may gauge the quality of a set of MLX explanations for why log entries or network packets are characterized as anomalous by an intrusion detector or other anomaly detector.

14.

发明授权
Machine learning-based DNS request string representation with hash replacement 有权

公开(公告)号：US11784964B2

公开(公告)日：2023-10-10

申请号：US17197375

申请日：2021-03-10

Applicant: Oracle International Corporation

Inventor： Renata Khasanova , Felix Schmidt , Stuart Wray , Craig Schelp , Nipun Agarwal , Matteo Casserini

IPC: H04L61/4511 , G06N20/00 , H04L41/16 , G06F40/30

CPC classification number: H04L61/4511 , G06N20/00 , H04L41/16 , G06F40/30

Abstract: Techniques are described herein for using machine learning to learn vector representations of DNS requests such that the resulting embeddings represent the semantics of the DNS requests as a whole. Techniques described herein perform pre-processing of tokenized DNS request strings in which hashes, which are long and relatively random strings of characters, are detected in DNS request strings and each detected hash token is replaced with a placeholder token. A vectorizing ML model is trained using the pre-processed training dataset in which hash tokens have been replaced. Embeddings for the DNS tokens are derived from an intermediate layer of the vectorizing ML model. The encoding application creates final vector representations for each DNS request string by generating a weighted summation of the embeddings of all of the tokens in the DNS request string. Because of hash replacement, the resulting DNS request embeddings reflect semantics of the hashes as a group.

15.

发明申请
GENERALIZED EXPECTATION MAXIMIZATION 有权

公开(公告)号：US20220027777A1

公开(公告)日：2022-01-27

申请号：US16935313

申请日：2020-07-22

Applicant: Oracle International Corporation

Inventor： Felix Schmidt , Yasha Pushak , Stuart Wray

IPC: G06N20/00 , G06F16/901 , G06N5/04

Abstract: Techniques are described that extend supervised machine-learning algorithms for use with semi-supervised training. Random labels are assigned to unlabeled training data, and the data is split into k partitions. During a label-training iteration, each of these k partitions is combined with the labeled training data, and the combination is used train a single instance of the machine-learning model. Each of these trained models are then used to predict labels for data points in the k−1 partitions of previously-unlabeled training data that were not used to train of the model. Thus, every data point in the previously-unlabeled training data obtains k−1 predicted labels. For each data point, these labels are aggregated to obtain a composite label prediction for the data point. After the labels are determined via one or more label-training iterations, a machine-learning model is trained on data with the resulting composite label predictions and on the labeled data set.

16.

发明授权
Malicious activity detection by cross-trace analysis and deep learning 有权

公开(公告)号：US11082438B2

公开(公告)日：2021-08-03

申请号：US16122398

申请日：2018-09-05

Applicant: Oracle International Corporation

Inventor： Juan Fernandez Peinador , Manel Fernandez Gomez , Guang-Tong Zhou , Hossein Hajimirsadeghi , Andrew Brownsword , Onur Kocberber , Felix Schmidt , Craig Schelp

IPC: H04L29/06 , G06N3/04 , G06K9/62 , G06F16/80

Abstract: Techniques are provided herein for contextual embedding of features of operational logs or network traffic for anomaly detection based on sequence prediction. In an embodiment, a computer has a predictive recurrent neural network (RNN) that detects an anomalous network flow. In an embodiment, an RNN contextually transcodes sparse feature vectors that represent log messages into dense feature vectors that may be predictive or used to generate predictive vectors. In an embodiment, graph embedding improves feature embedding of log traces. In an embodiment, a computer detects and feature-encodes independent traces from related log messages. These techniques may detect malicious activity by anomaly analysis of context-aware feature embeddings of network packet flows, log messages, and/or log traces.

17.

发明授权
Application- and infrastructure-aware orchestration for cloud monitoring applications 有权

公开(公告)号：US10892961B2

公开(公告)日：2021-01-12

申请号：US16271535

申请日：2019-02-08

Applicant: Oracle International Corporation

Inventor： Onur Kocberber , Felix Schmidt , Craig Schelp , Pravin Shinde

IPC: H04L12/24 , H04L12/26 , G06F9/455 , H04L29/08

Abstract: Herein are computerized techniques for autonomous and artificially intelligent administration of a computer cloud health monitoring system. In an embodiment, an orchestration computer automatically detects a current state of network elements of a computer network by processing: a) a network plan that defines a topology of the computer network, and b) performance statistics of the network elements. The network elements include computers that each hosts virtual execution environment(s). Each virtual execution environment hosts analysis logic that transforms raw performance data of a network element into a portion of the performance statistics. For each computer, a configuration specification for each virtual execution environment of the computer is automatically generated based on the network plan and the current state of the computer network. At least one virtual execution environment is automatically tuned and/or re-provisioned based on a generated configuration specification.

18.

发明申请
GRAPH PATH PREDICTION AND MASKED LANGUAGE MODELLING JOINT TRAINING ALGORITHM FOR LANGUAGE MODELS 有权

公开(公告)号：US20250060951A1

公开(公告)日：2025-02-20

申请号：US18235461

申请日：2023-08-18

Applicant: Oracle International Corporation

Inventor： Tomas Feith , Arno Schneuwly , Saeid Allahdadian , Matteo Casserini , Felix Schmidt

IPC: G06F8/41 , G06F16/901

Abstract: In an embodiment providing natural language processing (NLP), a computer generates a histogram that correctly represents a graph that represents a lexical text, and generates a token sequence encoder that is trainable and untrained. During training such as pretraining, the token sequence encoder infers an encoded sequence that incorrectly represents the lexical text, and the encoded sequence is dense and saves space. To increase the accuracy of the sequence encoder by learning, the token sequence encoder is adjusted based on, as discussed herein, an indirectly measured numeric difference between the encoded sequence that incorrectly represents the lexical text and the histogram that correctly represents the graph.

19.

发明申请
APPROXIMATE CONFUSION MATRIX FOR MULTI-LABEL CLASSIFICATION 有权

公开(公告)号：US20250036934A1

公开(公告)日：2025-01-30

申请号：US18227758

申请日：2023-07-28

Applicant: Oracle International Corporation

Inventor： Tomas Feith , Arno Schneuwly , Saeid Allahdadian , Matteo Casserini , Felix Schmidt

IPC: G06N3/08

Abstract: Herein is validation of a trained classifier based on novel and accelerated estimation of a confusion matrix. In an embodiment, a computer hosts a trained classifier that infers, from many objects, an inferred frequency of each class. An upscaled magnitude of each class is generated from the inferred frequency of the class. An integer of each class is generated from the upscaled magnitude of the class. Based on those integers of the classes and a target integer for each class, counts are generated of the objects that are true positives, false positives, and false negatives of the class. Based on those counts, an estimated total of true positives, false positives, false negatives are generated that characterizes fitness of the trained classifier. In an embodiment, those counts and totals are downscaled to be fractions from zero to one.

20.

发明授权
Semi-supervised framework for purpose-oriented anomaly detection 有权

公开(公告)号：US12143408B2

公开(公告)日：2024-11-12

申请号：US17739968

申请日：2022-05-09

Applicant: Oracle International Corporation

Inventor： Milos Vasic , Saeid Allahdadian , Matteo Casserini , Felix Schmidt , Andrew Brownsword

IPC: H04L9/40 , G06N20/20

Abstract: Techniques for implementing a semi-supervised framework for purpose-oriented anomaly detection are provided. In one technique, a data item in inputted into an unsupervised anomaly detection model, which generates first output. Based on the first output, it is determined whether the data item represents an anomaly. In response to determining that the data item represents an anomaly, the data item is inputted into a supervised classification model, which generates second output that indicates whether the data item is unknown. In response to determining that the data item is unknown, a training instance is generated based on the data item. The supervised classification model is updated based on the training instance.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification