Patent search ap:("Oracle International Corporation") AND inv:"Cong Duy Vu Hoang" Page 1

1.

发明公开
TECHNIQUES FOR OUT-OF-DOMAIN (OOD) DETECTION 审中-公开

公开(公告)号：US20240289555A1

公开(公告)日：2024-08-29

申请号：US18659606

申请日：2024-05-09

Applicant: Oracle International Corporation

Inventor： Thanh Long Duong , Mark Edward Johnson , Vishal Vishnoi , Crystal C. Pan , Vladislav Blinov , Cong Duy Vu Hoang , Elias Luqman Jalaluddin , Duy Vu , Balakota Srinivas Vinnakota

IPC: G06F40/30 , G06F40/205 , G06F40/289 , G06N20/00 , H04L51/02

CPC classification number: G06F40/30 , G06F40/289 , G06N20/00 , H04L51/02 , G06F40/205

Abstract: The present disclosure relates to techniques for identifying out-of-domain utterances. One particular technique includes receiving an utterance and a target domain of a chatbot, generating a sentence embedding for the utterance, obtaining an embedding representation for each cluster of in-domain utterances associated with the target domain, predicting, using a metric learning model, a first probability that the utterance belongs to the target domain based on a similarity or difference between the sentence embedding and each embedding representation for each cluster, predicting, using an outlier detection model, a second probability that the utterance belongs to the target domain based on a determined distance or density deviation between the sentence embedding and embedding representations for neighboring clusters, evaluating the first probability and the second probability to determine a final probability, and classifying the utterance as in-domain or out-of-domain for the chatbot based on the final probability.

2.

发明申请
FUSION OF WORD EMBEDDINGS AND WORD SCORES FOR TEXT CLASSIFICATION 有权

公开(公告)号：US20230100508A1

公开(公告)日：2023-03-30

申请号：US17936679

申请日：2022-09-29

Applicant: Oracle International Corporation

Inventor： Ahmed Ataallah Ataallah Abobakr , Mark Edward Johnson , Thanh Long Duong , Vladislav Blinov , Yu-Heng Hong , Cong Duy Vu Hoang , Duy Vu

IPC: G06F40/295 , G06F40/205 , G06F40/263

Abstract: Techniques disclosed herein relate generally to text classification and include techniques for fusing word embeddings with word scores for text classification. In one particular aspect, a method for text classification is provided that includes obtaining an embedding vector for a textual unit, based on a plurality of word embedding vectors and a plurality of word scores. The plurality of word embedding vectors includes a corresponding word embedding vector for each of a plurality of words of the textual unit, and the plurality of word scores includes a corresponding word score for each of the plurality of words of the textual unit. The method also includes passing the embedding vector for the textual unit through at least one feed-forward layer to obtain a final layer output, and performing a classification on the final layer output.

3.

发明申请
MULTI-FEATURE BALANCING FOR NATURAL LANGUAGE PROCESSORS 有权

公开(公告)号：US20220229991A1

公开(公告)日：2022-07-21

申请号：US17580535

申请日：2022-01-20

Applicant: Oracle International Corporation

Inventor： Thanh Long Duong , Vishal Vishnoi , Mark Edward Johnson , Elias Luqman Jalaluddin , Tuyen Quang Pham , Cong Duy Vu Hoang , Poorya Zaremoodi , Srinivasa Phani Kumar Gadde , Aashna Devang Kanuga , Zikai Li , Yuanxu Wu

IPC: G06F40/289 , G06F40/166 , G06N3/08

Abstract: Techniques are disclosed for systems including techniques for multi-feature balancing for natural langue processors. In an embodiment, a method includes receiving a natural language query to be processed by a machine learning model, the machine learning model utilizing a dataset of natural language phrases for processing natural language queries, determining, based on the machine learning model and the natural language query, a feature dropout value, generating, and based on the natural language query, one or more contextual features and one or more expressional features that may be input to the machine learning model, modifying at least one or the one or more contextual features and the one or more expressional features based on the feature dropout value to generate a set of input features for the machine learning model, and processing the set of input features to cause generating an output dataset for corresponding to the natural language query.

4.

发明申请
DISTANCE-BASED LOGIT VALUE FOR NATURAL LANGUAGE PROCESSING 有权

公开(公告)号：US20220171947A1

公开(公告)日：2022-06-02

申请号：US17456916

申请日：2021-11-30

Applicant: Oracle International Corporation

Inventor： Ying Xu , Poorya Zaremoodi , Thanh Tien Vu , Cong Duy Vu Hoang , Vladislav Blinov , Yu-Heng Hong , Yakupitiyage Don Thanuja Samodhye Dharmasiri , Vishal Vishnoi , Elias Luqman Jalaluddin , Manish Parekh , Thanh Long Duong , Mark Edward Johnson

IPC: G06F40/35 , H04L51/02 , G06N20/00

Abstract: Techniques for using logit values for classifying utterances and messages input to chatbot systems in natural language processing. A method can include a chatbot system receiving an utterance generated by a user interacting with the chatbot system. The chatbot system can input the utterance into a machine-learning model including a set of binary classifiers. Each binary classifier of the set of binary classifiers can be associated with a modified logit function. The method can also include the machine-learning model using the modified logit function to generate a set of distance-based logit values for the utterance. The method can also include the machine-learning model applying an enhanced activation function to the set of distance-based logit values to generate a predicted output. The method can also include the chatbot system classifying, based on the predicted output, the utterance as being associated with the particular class.

5.

发明申请
TECHNIQUES FOR OUT-OF-DOMAIN (OOD) DETECTION 有权

公开(公告)号：US20210303798A1

公开(公告)日：2021-09-30

申请号：US17217909

申请日：2021-03-30

Applicant: Oracle International Corporation

Inventor： Thanh Long Duong , Mark Edward Johnson , Vishal Vishnoi , Crystal C. Pan , Vladislav Blinov , Cong Duy Vu Hoang , Elias Luqman Jalaluddin , Duy Vu , Balakota Srinivas Vinnakota

IPC: G06F40/30 , G06F40/289 , H04L12/58 , G06N20/00

Abstract: The present disclosure relates to techniques for identifying out-of-domain utterances. One particular technique includes receiving an utterance and a target domain of a chatbot, generating a sentence embedding for the utterance, obtaining an embedding representation for each cluster of in-domain utterances associated with the target domain, predicting, using a metric learning model, a first probability that the utterance belongs to the target domain based on a similarity or difference between the sentence embedding and each embedding representation for each cluster, predicting, using an outlier detection model, a second probability that the utterance belongs to the target domain based on a determined distance or density deviation between the sentence embedding and embedding representations for neighboring clusters, evaluating the first probability and the second probability to determine a final probability, and classifying the utterance as in-domain or out-of-domain for the chatbot based on the final probability.

6.

发明授权
Framework for focused training of language models and techniques for end-to-end hypertuning of the framework 有权

公开(公告)号：US12288550B2

公开(公告)日：2025-04-29

申请号：US17952116

申请日：2022-09-23

Applicant: Oracle International Corporation

Inventor： Poorya Zaremoodi , Cong Duy Vu Hoang , Duy Vu , Dai Hoang Tran , Budhaditya Saha , Nagaraj N. Bhat , Thanh Tien Vu , Tuyen Quang Pham , Adam Craig Pocock , Katherine Silverstein , Srinivasa Phani Kumar Gadde , Vishal Vishnoi , Mark Edward Johnson , Thanh Long Duong

IPC: G10L15/06 , G10L15/183

Abstract: Techniques are disclosed herein for focused training of language models and end-to-end hypertuning of the framework. In one aspect, a method is provided that includes obtaining a machine learning model pre-trained for language modeling, and post-training the machine learning model for various tasks to generate a focused machine learning model. The post-training includes: (i) training the machine learning model on an unlabeled set of training data pertaining to a task that the machine learning model was pre-trained for as part of the language modeling, and the unlabeled set of training data is obtained with respect to a target domain, a target task, or a target language, and (ii) training the machine learning model on a labeled set of training data that pertains to another task that is an auxiliary task related to a downstream task to be performed using the machine learning model or output from the machine learning model.

7.

发明申请
MANAGING AMBIGUOUS DATE MENTIONS IN TRANSFORMING NATURAL LANGUAGE TO A LOGICAL FORM 有权

公开(公告)号：US20250095635A1

公开(公告)日：2025-03-20

申请号：US18656274

申请日：2024-05-06

Applicant: Oracle International Corporation

Inventor： Gioacchino Tangari , Cong Duy Vu Hoang , Stephen Andrew McRitchie , Steve Wai-Chun Siu , Dalu Guo , Christopher Mark Broadbent , Thanh Long Duong , Srinivasa Phani Kumar Gadde , Vishal Vishnoi , Kenneth Khiaw Hong Eng , Chandan Basavaraju

IPC: G10L15/06

Abstract: Techniques are disclosed herein for managing ambiguous date mentions in natural language utterances in transforming natural language utterances to logical forms by encoding the uncertainties of the ambiguous date mentions and including the encoded uncertainties in the logical forms. In a training phase, training examples including natural language utterances, logical forms, and database schema information are automatically augmented and used to train a machine learning model to convert natural language utterances to logical form. In an inference phase, input database schema information is augmented and used by the trained machine learning model to convert an input natural language utterance to logical form.

8.

发明申请
MANAGING DATE-TIME INTERVALS IN TRANSFORMING NATURAL LANGUAGE TO A LOGICAL FORM 有权

公开(公告)号：US20250094737A1

公开(公告)日：2025-03-20

申请号：US18794986

申请日：2024-08-05

Applicant: Oracle International Corporation

Inventor： Gioacchino Tangari , Cong Duy Vu Hoang , Dalu Guo , Steve Wai-Chun Siu , Stephen Andrew McRitchie , Christopher Mark Broadbent , Thanh Long Duong , Srinivasa Phani Kumar Gadde , Vishal Vishnoi , Chandan Basavaraju , Kenneth Khiaw Hong Eng

IPC: G06F40/58 , G06F40/166 , G06F40/253 , G06F40/295

Abstract: Techniques are disclosed herein for managing date-time intervals in transforming natural language utterances to logical forms by providing an enhanced grammar, a natural language utterance comprising a date-time interval, and database schema information to a machine learning model that has been trained to convert natural language utterances to logical forms; and using the machine learning model to convert the natural language utterance to an output logical form, wherein the output logical form comprises at least one of the date-time interval and an extraction function for extracting date-time information corresponding to the date-time interval from at least one date-time attribute of the database schema information.

9.

发明申请
TECHNIQUES FOR MANUFACTURING TRAINING DATA TO TRANSFORM NATURAL LANGUAGE INTO A VISUALIZATION REPRESENTATION 有权

公开(公告)号：US20250068626A1

公开(公告)日：2025-02-27

申请号：US18593316

申请日：2024-03-01

Applicant: Oracle International Corporation

Inventor： Gioacchino Tangari , Steve Wai-Chun Siu , Dalu Guo , Cong Duy Vu Hoang , Berk Sarioz , Chang Xu , Stephen Andrew McRitchie , Mark Edward Johnson , Christopher Mark Broadbent , Thanh Long Duong , Srinivasa Phani Kumar Gadde , Vishal Vishnoi , Chandan Basavaraju , Kenneth Khiaw Hong Eng

IPC: G06F16/2452 , G06F16/28

Abstract: The present disclosure relates to manufacturing training data by leveraging an automated pipeline that manufactures visualization training datasets to train a machine learning model to convert a natural language utterance into meaning representation language logical form that includes one or more visualization actions. Aspects are directed towards accessing an original training dataset, a visualization query dataset, an incremental visualization dataset, a manipulation visualization dataset, or any combination thereof. One or more visualization training datasets are generated by: (i) modifying examples in the original training dataset, the visualization query dataset, or both to include visualization actions, (ii) generating examples, using the incremental visualization dataset, the manipulation visualization dataset, or both, that include visualization actions, or (iii) both (i) and (ii). An augmented training dataset is generated by adding the one or more visualization training datasets to the original training dataset and then used to train the machine learning model.

10.

发明授权
Distance-based logit value for natural language processing 有权

公开(公告)号：US12019994B2

公开(公告)日：2024-06-25

申请号：US17456916

申请日：2021-11-30

Applicant: Oracle International Corporation

Inventor： Ying Xu , Poorya Zaremoodi , Thanh Tien Vu , Cong Duy Vu Hoang , Vladislav Blinov , Yu-Heng Hong , Yakupitiyage Don Thanuja Samodhye Dharmasiri , Vishal Vishnoi , Elias Luqman Jalaluddin , Manish Parekh , Thanh Long Duong , Mark Edward Johnson

IPC: G06F17/18 , G06F40/35 , G06N20/00 , H04L51/02 , G06F40/205 , G06F40/253

CPC classification number: G06F40/35 , G06N20/00 , H04L51/02 , G06F40/205 , G06F40/253

Abstract: Techniques for using logit values for classifying utterances and messages input to chatbot systems in natural language processing. A method can include a chatbot system receiving an utterance generated by a user interacting with the chatbot system. The chatbot system can input the utterance into a machine-learning model including a set of binary classifiers. Each binary classifier of the set of binary classifiers can be associated with a modified logit function. The method can also include the machine-learning model using the modified logit function to generate a set of distance-based logit values for the utterance. The method can also include the machine-learning model applying an enhanced activation function to the set of distance-based logit values to generate a predicted output. The method can also include the chatbot system classifying, based on the predicted output, the utterance as being associated with the particular class.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification