Patent search ap:("Oracle International Corporation") AND inv:"Milos Vasic" Page 1

1.

发明申请
One-Hot Encoder Using Lazy Evaluation Of Relational Statements 有权

公开(公告)号：US20250077519A1

公开(公告)日：2025-03-06

申请号：US18955689

申请日：2024-11-21

Applicant: Oracle International Corporation

Inventor： Felix Schmidt , Matteo Casserini , Milos Vasic , Marija Nikolic

IPC: G06F16/2453 , G06F16/2458

Abstract: A method and one or more non-transitory storage media are provided to train and implement a one-hot encoder. During a training phase, computation of an encoder state is performed by executing a set of relational statements to extract unique categories in a first training data set, associate each unique category with a unique index, and generate a one-hot encoding for each unique category. The set of relational statements are executed by a query optimization engine. Execution of the set of relational statements is postponed until a result of each relational statement is needed, and the query optimization engine implements one or more optimizations when executing the set of relational statements. During an encoding phase, a set of categorical features in a second training data set are encoded based on the encoder state to form a set of encoded categorical features.

2.

发明申请
ENCODING LOG-SPECIFIC ATTRIBUTES WITH NLP MODELS 有权

公开(公告)号：US20250021759A1

公开(公告)日：2025-01-16

申请号：US18219763

申请日：2023-07-10

Applicant: Oracle International Corporation

Inventor： Samuele Meta , Aneesh Dahiya , Felix Schmidt , Marija Nikolic , Matteo Casserini , Milos Vasic

IPC: G06F40/284 , G06F11/34

Abstract: Herein is natural language processing (NLP) to detect an anomalous log entry using a language model that infers an encoding of the log entry from novel generation of numeric lexical tokens. In an embodiment, a computer extracts an original numeric lexical token from a variable sized log entry. Substitute numeric lexical token(s) that represent the original numeric lexical token are generated, such as with a numeric exponent or by trigonometry. The log entry does not contain the substitute numeric lexical token. A novel sequence of lexical tokens that represents the log entry and contains the substitute numeric lexical token is generated. The novel sequence of lexical tokens does not contain the original numeric lexical token. The computer hosts and operates a machine learning model that generates, based on the novel sequence of lexical tokens that represents the log entry, an inference that characterizes the log entry with unprecedented accuracy.

3.

发明申请
GENERAL PURPOSE SQL REPRESENTATION MODEL 有权

公开(公告)号：US20240370429A1

公开(公告)日：2024-11-07

申请号：US18143776

申请日：2023-05-05

Applicant: Oracle International Corporation

Inventor： Aneesh Dahiya , Matteo Casserini , Marija Nikolic , Milos Vasic , Samuele Meta , Nikola Milojkovic , Felix Schmidt

IPC: G06F16/2452 , G06N3/0455 , G06N3/08

Abstract: In an embodiment, a computer generates sentence fingerprints that represent respective pluralities of similar database statements. Based on the sentence fingerprints, an artificial neural network is trained. After training the artificial neural network on a large corpus of fingerprinted database statements, the artificial neural network is ready to be used for zero-shot transfer learning to a downstream task in training. Database statement fingerprinting also anonymizes literal values in raw SQL statements. The trained artificial neural network can be safely reused without risk of disclosing sensitive data in the artificial neural network's vocabulary. After training, the artificial neural network infers a fixed-size encoded database statement from a new database statement. Based on the fixed-size encoded database statement, the new database statement is detected as anomalous, which increases database security and preserves database throughput by not executing the anomalous database statement.

4.

发明授权
One-hot encoder using lazy evaluation of relational statements 有权

公开(公告)号：US12182122B2

公开(公告)日：2024-12-31

申请号：US17964084

申请日：2022-10-12

Applicant: Oracle International Corporation

Inventor： Felix Schmidt , Matteo Casserini , Milos Vasic , Marija Nikolic

IPC: G06F16/00 , G06F16/2453 , G06F16/2458

Abstract: A method and one or more non-transitory storage media are provided to train and implement a one-hot encoder. During a training phase, computation of an encoder state is performed by executing a set of relational statements to extract unique categories in a first training data set, associate each unique category with a unique index, and generate a one-hot encoding for each unique category. The set of relational statements are executed by a query optimization engine. Execution of the set of relational statements is postponed until a result of each relational statement is needed, and the query optimization engine implements one or more optimizations when executing the set of relational statements. During an encoding phase, a set of categorical features in a second training data set are encoded based on the encoder state to form a set of encoded categorical features.

5.

发明公开
ANOMALY SCORE NORMALISATION BASED ON EXTREME VALUE THEORY 审中-公开

公开(公告)号：US20230368054A1

公开(公告)日：2023-11-16

申请号：US17745103

申请日：2022-05-16

Applicant: Oracle International Corporation

Inventor： Marija Nikolic , Matteo Casserini , Arno Schneuwly , Nikola Milojkovic , Milos Vasic , Renata Khasanova , Felix Schmidt

IPC: G06N7/00 , G06N20/00

CPC classification number: G06N7/005 , G06N20/00

Abstract: The present invention relates to threshold estimation and calibration for anomaly detection. Herein are machine learning (ML) and extreme value theory (EVT) techniques for normalizing and thresholding anomaly scores without presuming a values distribution. In an embodiment, a computer receives many unnormalized anomaly scores and, according to peak over threshold (POT), selects a highest subset of the unnormalized anomaly scores that exceed a tail threshold. Based on the highest subset of the unnormalized anomaly scores, parameters of a probability density function are trained according to EVT. After training and in a production environment, a normalized anomaly score is generated based on an unnormalized anomaly score and the trained parameters of the probability density function. Anomaly detection compares the normalized anomaly score to an optimized anomaly threshold.

6.

发明申请
SEPARATION MAXIMIZATION TECHNIQUE FOR ANOMALY SCORES TO COMPARE ANOMALY DETECTION MODELS 有权

公开(公告)号：US20220138504A1

公开(公告)日：2022-05-05

申请号：US17083536

申请日：2020-10-29

Applicant: Oracle International Corporation

Inventor： Hesam Fathi Moghadam , Anatoly Yakovlev , Sandeep Agrawal , Venkatanathan Varadarajan , Robert Hopkins , Matteo Casserini , Milos Vasic , Sanjay Jinturkar , Nipun Agarwal

IPC: G06K9/62 , G06N20/20

Abstract: In an embodiment based on computer(s), an ML model is trained to detect outliers. The ML model calculates anomaly scores that include a respective anomaly score for each item in a validation dataset. The anomaly scores are automatically organized by sorting and/or clustering. Based on the organized anomaly scores, a separation is measured that indicates fitness of the ML model. In an embodiment, a computer performs two-clustering of anomaly scores into a first organization that consists of a first normal cluster of anomaly scores and a first anomaly cluster of anomaly scores. The computer performs three-clustering of the same anomaly scores into a second organization that consists of a second normal cluster of anomaly scores, a second anomaly cluster of anomaly scores, and a middle cluster of anomaly scores. A distribution difference between the first organization and the second organization is measured. An ML model is processed based on the distribution difference.

7.

发明授权
Sparse ensembling of unsupervised models 有权

公开(公告)号：US12020131B2

公开(公告)日：2024-06-25

申请号：US17221212

申请日：2021-04-02

Applicant: Oracle International Corporation

Inventor： Saeid Allahdadian , Amin Suzani , Milos Vasic , Matteo Casserini , Andrew Brownsword , Felix Schmidt , Nipun Agarwal

IPC: G06N20/20 , G06N3/04 , G06N3/0442 , G06N3/045 , G06N3/0495 , G06N3/08 , G06N3/088 , G06N20/00

CPC classification number: G06N20/20 , G06N3/04 , G06N3/0495 , G06N3/08 , G06N3/088 , G06N3/0442 , G06N3/045 , G06N20/00

Abstract: Techniques are provided for sparse ensembling of unsupervised machine learning models. In an embodiment, the proposed architecture is composed of multiple unsupervised machine learning models that each produce a score as output and a gating network that analyzes the inputs and outputs of the unsupervised machine learning models to select an optimal ensemble of unsupervised machine learning models. The gating network is trained to choose a minimal number of the multiple unsupervised machine learning models whose scores are combined to create a final score that matches or closely resembles a final score that is computed using all the scores of the multiple unsupervised machine learning models.

8.

发明授权
Multi-stage feature extraction for effective ML-based anomaly detection on structured log data 有权

公开(公告)号：US11704386B2

公开(公告)日：2023-07-18

申请号：US17199563

申请日：2021-03-12

Applicant: Oracle International Corporation

Inventor： Amin Suzani , Saeid Allahdadian , Milos Vasic , Matteo Casserini , Hamed Ahmadi , Felix Schmidt , Andrew Brownsword , Nipun Agarwal

IPC: G06F18/214 , G06N20/00 , G06V10/75 , G06F18/23

CPC classification number: G06F18/214 , G06F18/23 , G06N20/00 , G06V10/758

Abstract: Herein are feature extraction mechanisms that receive parsed log messages as inputs and transform them into numerical feature vectors for machine learning models (MLMs). In an embodiment, a computer extracts fields from a log message. Each field specifies a name, a text value, and a type. For each field, a field transformer for the field is dynamically selected based the field's name and/or the field's type. The field transformer converts the field's text value into a value of the field's type. A feature encoder for the value of the field's type is dynamically selected based on the field's type and/or a range of the field's values that occur in a training corpus of an MLM. From the feature encoder, an encoding of the value of the field's typed is stored into a feature vector. Based on the MLM and the feature vector, the log message is detected as anomalous.

9.

发明申请
AUTOMATICALLY CHANGE ANOMALY DETECTION THRESHOLD BASED ON PROBABILISTIC DISTRIBUTION OF ANOMALY SCORES 有权

公开(公告)号：US20220188694A1

公开(公告)日：2022-06-16

申请号：US17122401

申请日：2020-12-15

Applicant: Oracle International Corporation

Inventor： Amin Suzani , Matteo Casserini , Milos Vasic , Saeid Allahdadian , Andrew Brownsword , Hamed Ahmadi , Felix Schmidt , Nipun Agarwal

IPC: G06N20/00 , G06N7/00 , G06F17/18

Abstract: Approaches herein relate to model decay of an anomaly detector due to concept drift. Herein are machine learning techniques for dynamically self-tuning an anomaly score threshold. In an embodiment in a production environment, a computer receives an item in a stream of items. A machine learning (ML) model hosted by the computer infers by calculation an anomaly score for the item. Whether the item is anomalous or not is decided based on the anomaly score and an adaptive anomaly threshold that dynamically fluctuates. A moving standard deviation of anomaly scores is adjusted based on a moving average of anomaly scores. The moving average of anomaly scores is then adjusted based on the anomaly score. The adaptive anomaly threshold is then adjusted based on the moving average of anomaly scores and the moving standard deviation of anomaly scores.

10.

发明授权
Semi-supervised framework for purpose-oriented anomaly detection 有权

公开(公告)号：US12143408B2

公开(公告)日：2024-11-12

申请号：US17739968

申请日：2022-05-09

Applicant: Oracle International Corporation

Inventor： Milos Vasic , Saeid Allahdadian , Matteo Casserini , Felix Schmidt , Andrew Brownsword

IPC: H04L9/40 , G06N20/20

Abstract: Techniques for implementing a semi-supervised framework for purpose-oriented anomaly detection are provided. In one technique, a data item in inputted into an unsupervised anomaly detection model, which generates first output. Based on the first output, it is determined whether the data item represents an anomaly. In response to determining that the data item represents an anomaly, the data item is inputted into a supervised classification model, which generates second output that indicates whether the data item is unknown. In response to determining that the data item is unknown, a training instance is generated based on the data item. The supervised classification model is updated based on the training instance.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification