Patent search ap:("Oracle International Corporation") AND inv:"Mark Edward Johnson" Page 2

11.

发明申请
FUSION OF WORD EMBEDDINGS AND WORD SCORES FOR TEXT CLASSIFICATION 有权

公开(公告)号：US20230100508A1

公开(公告)日：2023-03-30

申请号：US17936679

申请日：2022-09-29

Applicant: Oracle International Corporation

Inventor： Ahmed Ataallah Ataallah Abobakr , Mark Edward Johnson , Thanh Long Duong , Vladislav Blinov , Yu-Heng Hong , Cong Duy Vu Hoang , Duy Vu

IPC: G06F40/295 , G06F40/205 , G06F40/263

Abstract: Techniques disclosed herein relate generally to text classification and include techniques for fusing word embeddings with word scores for text classification. In one particular aspect, a method for text classification is provided that includes obtaining an embedding vector for a textual unit, based on a plurality of word embedding vectors and a plurality of word scores. The plurality of word embedding vectors includes a corresponding word embedding vector for each of a plurality of words of the textual unit, and the plurality of word scores includes a corresponding word score for each of the plurality of words of the textual unit. The method also includes passing the embedding vector for the textual unit through at least one feed-forward layer to obtain a final layer output, and performing a classification on the final layer output.

12.

发明授权
Noise data augmentation for natural language processing 有权

公开(公告)号：US11538457B2

公开(公告)日：2022-12-27

申请号：US17016117

申请日：2020-09-09

Applicant: Oracle International Corporation

Inventor： Elias Luqman Jalaluddin , Vishal Vishnoi , Mark Edward Johnson , Thanh Long Duong , Yu-Heng Hong , Balakota Srinivas Vinnakota

IPC: G10L15/22 , G10L15/06 , G10L15/05 , G10L15/18 , G10L15/26

Abstract: Techniques for noise data augmentation for training chatbot systems in natural language processing. In one particular aspect, a method is provided that includes receiving a training set of utterances for training an intent classifier to identify one or more intents for one or more utterances; augmenting the training set of utterances with noise text to generate an augmented training set of utterances; and training the intent classifier using the augmented training set of utterances. The augmenting includes: obtaining the noise text from a list of words, a text corpus, a publication, a dictionary, or any combination thereof irrelevant of original text within the utterances of the training set of utterances, and incorporating the noise text within the utterances relative to the original text in the utterances of the training set of utterances at a predefined augmentation ratio to generate augmented utterances.

13.

发明授权
Using backpropagation to train a dialog system 有权

公开(公告)号：US11508359B2

公开(公告)日：2022-11-22

申请号：US17002229

申请日：2020-08-25

Applicant: Oracle International Corporation

Inventor： Thanh Long Duong , Mark Edward Johnson

IPC: G10L15/16 , G10L15/06 , G06N3/08 , G06N20/00 , G10L15/22

Abstract: Techniques described herein use backpropagation to train one or more machine learning (ML) models of a dialog system. For instance, a method includes accessing seed data that includes training tuples, where each training tuple comprising a respective logical form. The method includes converting the logical form of a training tuple to a converted logical form, by applying to the logical form a text-to-speech (TTS) subsystem, an automatic speech recognition (ASR) subsystem, and a semantic parser of a dialog system. The method includes determining a training signal by using an objective function to compare the converted logical form to the logical form. The method further includes training the TTS subsystem, the ASR subsystem, and the semantic parser via backpropagation based on the training signal. As a result of the training by backpropagation, the machine learning models are tuned work effectively together within a pipeline of the dialog system.

14.

发明授权
Implementing a correction model to reduce propagation of automatic speech recognition errors 有权

公开(公告)号：US11462208B2

公开(公告)日：2022-10-04

申请号：US16992291

申请日：2020-08-13

Applicant: Oracle International Corporation

Inventor： Thanh Long Duong , Mark Edward Johnson

IPC: G10L15/01 , G10L15/16 , G10L15/22

Abstract: Some techniques described herein determine a correction model for a dialog system, such that the correction model corrects output from an automatic speech recognition (ASR) subsystem in the dialog system. A method described herein includes accessing training data. A first tuple of the training data includes an utterance, where the utterance is a textual representation of speech. The method further includes using an ASR subsystem of a dialog system to convert the utterance to an output utterance. The method further includes storing the output utterance in corrective training data that is based on the training data. The method further includes training a correction model based on the corrective training data, such that the correction model is configured to correct output from the ASR subsystem during operation of the dialog system.

15.

发明申请
MULTI-FEATURE BALANCING FOR NATURAL LANGUAGE PROCESSORS 有权

公开(公告)号：US20220229991A1

公开(公告)日：2022-07-21

申请号：US17580535

申请日：2022-01-20

Applicant: Oracle International Corporation

Inventor： Thanh Long Duong , Vishal Vishnoi , Mark Edward Johnson , Elias Luqman Jalaluddin , Tuyen Quang Pham , Cong Duy Vu Hoang , Poorya Zaremoodi , Srinivasa Phani Kumar Gadde , Aashna Devang Kanuga , Zikai Li , Yuanxu Wu

IPC: G06F40/289 , G06F40/166 , G06N3/08

Abstract: Techniques are disclosed for systems including techniques for multi-feature balancing for natural langue processors. In an embodiment, a method includes receiving a natural language query to be processed by a machine learning model, the machine learning model utilizing a dataset of natural language phrases for processing natural language queries, determining, based on the machine learning model and the natural language query, a feature dropout value, generating, and based on the natural language query, one or more contextual features and one or more expressional features that may be input to the machine learning model, modifying at least one or the one or more contextual features and the one or more expressional features based on the feature dropout value to generate a set of input features for the machine learning model, and processing the set of input features to cause generating an output dataset for corresponding to the natural language query.

16.

发明申请
DISTANCE-BASED LOGIT VALUE FOR NATURAL LANGUAGE PROCESSING 有权

公开(公告)号：US20220171947A1

公开(公告)日：2022-06-02

申请号：US17456916

申请日：2021-11-30

Applicant: Oracle International Corporation

Inventor： Ying Xu , Poorya Zaremoodi , Thanh Tien Vu , Cong Duy Vu Hoang , Vladislav Blinov , Yu-Heng Hong , Yakupitiyage Don Thanuja Samodhye Dharmasiri , Vishal Vishnoi , Elias Luqman Jalaluddin , Manish Parekh , Thanh Long Duong , Mark Edward Johnson

IPC: G06F40/35 , H04L51/02 , G06N20/00

Abstract: Techniques for using logit values for classifying utterances and messages input to chatbot systems in natural language processing. A method can include a chatbot system receiving an utterance generated by a user interacting with the chatbot system. The chatbot system can input the utterance into a machine-learning model including a set of binary classifiers. Each binary classifier of the set of binary classifiers can be associated with a modified logit function. The method can also include the machine-learning model using the modified logit function to generate a set of distance-based logit values for the utterance. The method can also include the machine-learning model applying an enhanced activation function to the set of distance-based logit values to generate a predicted output. The method can also include the chatbot system classifying, based on the predicted output, the utterance as being associated with the particular class.

17.

发明申请
BATCHING TECHNIQUES FOR HANDLING UNBALANCED TRAINING DATA FOR A CHATBOT 有权

公开(公告)号：US20210304075A1

公开(公告)日：2021-09-30

申请号：US17217623

申请日：2021-03-30

Applicant: Oracle International Corporation

Inventor： Thanh Long Duong , Mark Edward Johnson , Vishal Vishnoi , Balakota Srinivas Vinnakota , Yu-Heng Hong , Elias Luqman Jalaluddin

IPC: G06N20/00 , G06F40/30 , G10L15/18 , G10L15/06 , G10L15/22 , G10L15/197

Abstract: The present disclosure relates to chatbot systems, and more particularly, to batching techniques for handling unbalanced training data when training a model such that bias is removed from the trained machine learning model when performing inference. In an embodiment, a plurality of raw utterances is obtained. A bias eliminating distribution is determined and a subset of the plurality of raw utterances is batched according to the bias-reducing distribution. The resulting unbiased training data may be input into a prediction model for training the prediction model. The trained prediction model may be obtained and utilized to predict unbiased results from new inputs received by the trained prediction model.

18.

发明申请
TECHNIQUES FOR OUT-OF-DOMAIN (OOD) DETECTION 有权

公开(公告)号：US20210303798A1

公开(公告)日：2021-09-30

申请号：US17217909

申请日：2021-03-30

Applicant: Oracle International Corporation

Inventor： Thanh Long Duong , Mark Edward Johnson , Vishal Vishnoi , Crystal C. Pan , Vladislav Blinov , Cong Duy Vu Hoang , Elias Luqman Jalaluddin , Duy Vu , Balakota Srinivas Vinnakota

IPC: G06F40/30 , G06F40/289 , H04L12/58 , G06N20/00

Abstract: The present disclosure relates to techniques for identifying out-of-domain utterances. One particular technique includes receiving an utterance and a target domain of a chatbot, generating a sentence embedding for the utterance, obtaining an embedding representation for each cluster of in-domain utterances associated with the target domain, predicting, using a metric learning model, a first probability that the utterance belongs to the target domain based on a similarity or difference between the sentence embedding and each embedding representation for each cluster, predicting, using an outlier detection model, a second probability that the utterance belongs to the target domain based on a determined distance or density deviation between the sentence embedding and embedding representations for neighboring clusters, evaluating the first probability and the second probability to determine a final probability, and classifying the utterance as in-domain or out-of-domain for the chatbot based on the final probability.

19.

发明申请
TASK-ORIENTED DIALOG SUITABLE FOR A STANDALONE DEVICE 有权

公开(公告)号：US20210065709A1

公开(公告)日：2021-03-04

申请号：US17005847

申请日：2020-08-28

Applicant: Oracle International Corporation

Inventor： Thanh Long Duong , Mark Edward Johnson , Vu Cong Duy Hoang , Tuyen Quang Pham , Yu-Heng Hong , Vladislavs Dovgalecs , Guy Bashkansky , Jason Black , Andrew David Bleeker , Serge Le Huitouze

IPC: G10L15/22

Abstract: Described herein are dialog systems, and techniques for providing such dialog systems, that are suitable for use on standalone computing devices. In some embodiments, a dialog system includes a dialog manager, which takes as input an input logical form, which may be a representation of user input. The dialog, manager may include a dialog state tracker, an execution subsystem, a dialog policy subsystem, and a context stack. The dialog state tracker may generate an intermediate logical form from the input logical form combined with a context from the context stack. The context stack may maintain a history of a current dialog, and thus, the intermediate logical form may include contextual information potentially missing from the input logical form. The execution subsystem may execute the intermediate logical form to produce an execution result, and the dialog policy subsystem may generate an output logical form based on the execution result.

20.

发明申请
DISTANCE-BASED LOGIT VALUES FOR NATURAL LANGUAGE PROCESSING 有权

公开(公告)号：US20250117591A1

公开(公告)日：2025-04-10

申请号：US18988114

申请日：2024-12-19

Applicant: Oracle International Corporation

Inventor： Ying XU , Poorya Zaremoodi , Thanh Tien Vu , Cong Duy Vu Hoang , Vladislav Blinov , Yu-Heng Hong , Yakupitiyage Don Thanuja Samodhye Dharmasiri , Vishal Vishnoi , Elias Luqman Jalaluddin , Manish Parekh , Thanh Long Duong , Mark Edward Johnson

IPC: G06F40/35 , G06F40/205 , G06F40/253 , G06N20/00 , H04L51/02

Abstract: Techniques for using logit values for classifying utterances and messages input to chatbot systems in natural language processing. A method can include a chatbot system receiving an utterance generated by a user interacting with the chatbot system. The chatbot system can input the utterance into a machine-learning model including a set of binary classifiers. Each binary classifier of the set of binary classifiers can be associated with a modified logit function. The method can also include the machine-learning model using the modified logit function to generate a set of distance-based logit values for the utterance. The method can also include the machine-learning model applying an enhanced activation function to the set of distance-based logit values to generate a predicted output. The method can also include the chatbot system classifying, based on the predicted output, the utterance as being associated with the particular class.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification