Patent search ap:("Oracle International Corporation") AND inv:"Felix Schmidt" Page 1

1.

发明申请
ENCODING LOG-SPECIFIC ATTRIBUTES WITH NLP MODELS 有权

公开(公告)号：US20250021759A1

公开(公告)日：2025-01-16

申请号：US18219763

申请日：2023-07-10

Applicant: Oracle International Corporation

Inventor： Samuele Meta , Aneesh Dahiya , Felix Schmidt , Marija Nikolic , Matteo Casserini , Milos Vasic

IPC: G06F40/284 , G06F11/34

Abstract: Herein is natural language processing (NLP) to detect an anomalous log entry using a language model that infers an encoding of the log entry from novel generation of numeric lexical tokens. In an embodiment, a computer extracts an original numeric lexical token from a variable sized log entry. Substitute numeric lexical token(s) that represent the original numeric lexical token are generated, such as with a numeric exponent or by trigonometry. The log entry does not contain the substitute numeric lexical token. A novel sequence of lexical tokens that represents the log entry and contains the substitute numeric lexical token is generated. The novel sequence of lexical tokens does not contain the original numeric lexical token. The computer hosts and operates a machine learning model that generates, based on the novel sequence of lexical tokens that represents the log entry, an inference that characterizes the log entry with unprecedented accuracy.

2.

发明申请
GENERAL PURPOSE SQL REPRESENTATION MODEL 有权

公开(公告)号：US20240370429A1

公开(公告)日：2024-11-07

申请号：US18143776

申请日：2023-05-05

Applicant: Oracle International Corporation

Inventor： Aneesh Dahiya , Matteo Casserini , Marija Nikolic , Milos Vasic , Samuele Meta , Nikola Milojkovic , Felix Schmidt

IPC: G06F16/2452 , G06N3/0455 , G06N3/08

Abstract: In an embodiment, a computer generates sentence fingerprints that represent respective pluralities of similar database statements. Based on the sentence fingerprints, an artificial neural network is trained. After training the artificial neural network on a large corpus of fingerprinted database statements, the artificial neural network is ready to be used for zero-shot transfer learning to a downstream task in training. Database statement fingerprinting also anonymizes literal values in raw SQL statements. The trained artificial neural network can be safely reused without risk of disclosing sensitive data in the artificial neural network's vocabulary. After training, the artificial neural network infers a fixed-size encoded database statement from a new database statement. Based on the fixed-size encoded database statement, the new database statement is detected as anomalous, which increases database security and preserves database throughput by not executing the anomalous database statement.

3.

发明公开
TRAINING SYNTAX-AWARE LANGUAGE MODELS WITH AST PATH PREDICTION 审中-公开

公开(公告)号：US20240345815A1

公开(公告)日：2024-10-17

申请号：US18202564

申请日：2023-05-26

Applicant: Oracle International Corporation

Inventor： Pritam Dash , Arno Schneuwly , Saeid Allahdadian , Matteo Casserini , Felix Schmidt

IPC: G06F8/41

CPC classification number: G06F8/427

Abstract: In an embodiment, a computer stores and operates a logic encoder that is an artificial neural network that infers a fixed-size encoded logic from textual or tokenized source logic. Without machine learning, a special parser generates a parse tree that represents the source logic and a fixed-size correctly encoded tree that represents the parse tree. For finetuning the logic encoder, an encoded tree generator is an artificial neural network that accepts the fixed-size encoded logic as input and responsively infers a fixed-size incorrectly encoded tree that represents the parse tree. The neural weights of the logic encoder (and optionally of the encoded tree generator) are adjusted based on backpropagation of error (i.e. loss) as a numerically measured difference between the fixed-size incorrectly encoded tree and the fixed-size correctly encoded tree.

4.

发明公开
SUPER-FEATURES FOR EXPLAINABILITY WITH PERTURBATION-BASED APPROACHES 审中-公开

公开(公告)号：US20230334343A1

公开(公告)日：2023-10-19

申请号：US17719617

申请日：2022-04-13

Applicant: Oracle International Corporation

Inventor： Renata Khasanova , Nikola Milojkovic , Matteo Casserini , Felix Schmidt

IPC: G06N5/04

CPC classification number: G06N5/04

Abstract: In an embodiment, a computer hosts a machine learning (ML) model that infers a particular inference for a particular tuple that is based on many features. The features are grouped into predefined super-features that each contain a disjoint (i.e. nonintersecting, mutually exclusive) subset of features. For each super-feature, the computer: a) randomly selects many permuted values from original values of the super-feature in original tuples, b) generates permuted tuples that are based on the particular tuple and a respective permuted value, and c) causes the ML model to infer a respective permuted inference for each permuted tuple. A surrogate model is trained based on the permuted inferences. For each super-feature, a respective importance of the super-feature is calculated based on the surrogate model. Super-feature importances may be used to rank super-features by influence and/or generate a local ML explainability (MLX) explanation.

5.

发明申请
CODE DICTIONARY GENERATION BASED ON NON-BLOCKING OPERATIONS 有权

公开(公告)号：US20210390089A1

公开(公告)日：2021-12-16

申请号：US17459447

申请日：2021-08-27

Applicant: Oracle International Corporation

Inventor： Pit Fender , Felix Schmidt , Benjamin Schlegel , Matthias Brantner , Nipun Agarwal

IPC: G06F16/23 , G06F16/22 , G06F16/28

Abstract: Techniques related to code dictionary generation based on non-blocking operations are disclosed. In some embodiments, a column of tokens includes a first token and a second token that are stored in separate rows. The column of tokens is correlated with a set of row identifiers including a first row identifier and a second row identifier that is different from the first row identifier. Correlating the column of tokens with the set of row identifiers involves: storing a correlation between the first token and the first row identifier, storing a correlation between the second token and the second row identifier if the first token and the second token have different values, and storing a correlation between the second token and the first row identifier if the first token and the second token have identical values. After correlating the column of tokens with the set of row identifiers, duplicate correlations are removed.

6.

发明授权
Code dictionary generation based on non-blocking operations 有权

公开(公告)号：US11126611B2

公开(公告)日：2021-09-21

申请号：US15897375

申请日：2018-02-15

Applicant: Oracle International Corporation

Inventor： Pit Fender , Felix Schmidt , Benjamin Schlegel , Matthias Brantner , Nipun Agarwal

IPC: G06F16/23 , G06F16/22 , G06F16/28

Abstract: Techniques related to code dictionary generation based on non-blocking operations are disclosed. In some embodiments, a column of tokens includes a first token and a second token that are stored in separate rows. The column of tokens is correlated with a set of row identifiers including a first row identifier and a second row identifier that is different from the first row identifier. Correlating the column of tokens with the set of row identifiers involves: storing a correlation between the first token and the first row identifier, storing a correlation between the second token and the second row identifier if the first token and the second token have different values, and storing a correlation between the second token and the first row identifier if the first token and the second token have identical values. After correlating the column of tokens with the set of row identifiers, duplicate correlations are removed.

7.

发明授权
Engine for reactive execution of massively concurrent heterogeneous accelerated scripted streaming analyses 有权

公开(公告)号：US10768982B2

公开(公告)日：2020-09-08

申请号：US16135802

申请日：2018-09-19

Applicant: Oracle International Corporation

Inventor： Andrew Brownsword , Tayler Hetherington , Pavan Chandrashekar , Akhilesh Singhania , Stuart Wray , Pravin Shinde , Felix Schmidt , Craig Schelp , Onur Kocberber , Juan Fernandez Peinador , Rod Reddekopp , Manel Fernandez Gomez , Nipun Agarwal

IPC: G06F9/54 , G06F9/48 , G06F11/30 , G06F9/38 , G06F9/455 , G06N20/00 , G06F16/11 , G06F16/18

Abstract: Herein are techniques for analysis of data streams. In an embodiment, a computer associates each software actor with data streams. Each software actor has its own backlog queue of data to analyze. In response to receiving some stream content and based on the received stream content, data is distributed to some software actors. In response to determining that the data satisfies completeness criteria of a particular software actor, an indication of the data is appended onto the backlog queue of the particular software actor. The particular software actor is reset to an initial state by loading an execution snapshot of a previous initial execution of an embedded virtual machine. Based on the particular software actor, execution of the execution snapshot of the previous initial execution is resumed to dequeue and process the indication of the data from the backlog queue of the particular software actor to generate a result.

8.

发明申请
DETECTING DEVICE UTILIZATION IMBALANCES 审中-公开

公开(公告)号：US20200034208A1

公开(公告)日：2020-01-30

申请号：US16044230

申请日：2018-07-24

Applicant: Oracle International Corporation

Inventor： Stuart Wray , Felix Schmidt , Craig Robert Schelp , Manel Fernandez Gomez , Nipun Agarwal

IPC: G06F9/50

Abstract: Embodiments monitor statistics from groups of devices and generate an alarm upon detecting a utilization imbalance that is beyond a threshold. Particular balance statistics are periodically sampled, over a timeframe, for a group of devices configured to have balanced utilization. The devices are ranked at every data collection timestamp based on the gathered device statistics. The numbers of times each device appears within each rank over the timeframe are tallied. The device/rank summations are collectively used as a probability distribution representing the probability of each device being ranked at each of the rankings in the future. Based on this probability distribution, an entropy value that represents a summary of the imbalance of the group of devices over the timeframe is derived. An imbalance alert is generated when one or more entropy values for a group of devices shows an imbalanced utilization of the devices going beyond an identified imbalance threshold.

9.

发明申请
PARTIAL GRAPH PATH PREDICTION AND NEXT TOKEN PREDICTION JOINT TRAINING ALGORITHM FOR GENERATIVE LANGUAGE MODELS 有权

公开(公告)号：US20250165852A1

公开(公告)日：2025-05-22

申请号：US18514391

申请日：2023-11-20

Applicant: Oracle International Corporation

Inventor： Tomas Feith , Arno Schneuwly , Saeid Allahdadian , Matteo Casserini , Felix Schmidt

IPC: G06N20/00

Abstract: During pretraining, a computer generates three untrained machine learning models that are a token sequence encoder, a token predictor, and a decoder that infers a frequency distribution of graph traversal paths. A sequence of lexical tokens is generated that represents a lexical text in a training corpus. A graph is generated that represents the lexical text. In the graph, multiple traversal paths are selected that collectively represent a sliding subsequence of the sequence of lexical tokens. From the subsequence, the token sequence encoder infers an encoded sequence that represents the subsequence of the sequence of lexical tokens. The decoder and token predictor accept the encoded sequence as input for respective inferencing for which respective training losses are measured. Both training losses are combined into a combined loss that is used to increase the accuracy of the three machine learning models by, for example, backpropagation of the combined loss.

10.

发明申请
CONTEXTUAL RE-RANKING BASED ON CURSOR POSITION FOR DOCUMENTATION RECOMMENDER SYSTEMS 有权

公开(公告)号：US20250110961A1

公开(公告)日：2025-04-03

申请号：US18374209

申请日：2023-09-28

Applicant: Oracle International Corporation

Inventor： Tomas Feith , Arno Schneuwly , Saeid Allahdadian , Matteo Casserini , Kristopher Leland Rice , Felix Schmidt

IPC: G06F16/2457 , G06F16/248

Abstract: Here is dynamic and contextual ranking of reference documentation based on an interactively selected position in new source logic. A computer receives a vocabulary of lexical tokens, a sequence of references that contains a first reference to a first reference document before a second reference to a second reference document, respective subsets of the vocabulary that occur in the first and second reference documents, a new source logic that contains a sequence of lexical tokens, respective measurements of semantic distance between the new source logic and the first and second reference documents, and a selected position in the sequence of lexical tokens. Based on the selected position, the measurements of semantic distance are selectively increased. Based on that increasing the measurements of the semantic distance, a relative ordering of the first and second references is reversed to generate and display a reordered sequence of references.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification