Patent search ap:("Oracle International Corporation") AND inv:"Cong Duy Vu Hoang" Page 5

41.

发明授权
System and techniques for handling long text for pre-trained language models 有权

公开(公告)号：US12210830B2

公开(公告)日：2025-01-28

申请号：US17750240

申请日：2022-05-20

Applicant: Oracle International Corporation

Inventor： Thanh Tien Vu , Tuyen Quang Pham , Mark Edward Johnson , Thanh Long Duong , Ying Xu , Poorya Zaremoodi , Omid Mohamad Nezami , Budhaditya Saha , Cong Duy Vu Hoang

IPC: G06F40/30 , G06F40/169 , G06F40/284 , G06F40/295

Abstract: In some aspects, a computing device may receive, at a data processing system, a set of utterances for training or inferencing with a named entity recognizer to assign a label to each token piece from the set of utterances. The computing device may determine a length of each utterance in the set and when the length of the utterance exceeds a pre-determined threshold of token pieces: dividing the utterance into a plurality of overlapping chunks of token pieces; assigning a label together with a confidence score for each token piece in a chunk; determining a final label and an associated confidence score for each chunk of token pieces by merging two confidence scores; determining a final annotated label for the utterance based at least on the merging the two confidence scores; and storing the final annotated label in a memory.

42.

发明授权
Multi-feature balancing for natural language processors 有权

公开(公告)号：US12153885B2

公开(公告)日：2024-11-26

申请号：US17580535

申请日：2022-01-20

Applicant: Oracle International Corporation

Inventor： Thanh Long Duong , Vishal Vishnoi , Mark Edward Johnson , Elias Luqman Jalaluddin , Tuyen Quang Pham , Cong Duy Vu Hoang , Poorya Zaremoodi , Srinivasa Phani Kumar Gadde , Aashna Devang Kanuga , Zikai Li , Yuanxu Wu

IPC: G06F40/289 , G06F40/166 , G06F40/205 , G06F40/263 , G06F40/279 , G06F40/295 , G06N3/08 , H04L51/02

Abstract: Techniques are disclosed for systems including techniques for multi-feature balancing for natural langue processors. In an embodiment, a method includes receiving a natural language query to be processed by a machine learning model, the machine learning model utilizing a dataset of natural language phrases for processing natural language queries, determining, based on the machine learning model and the natural language query, a feature dropout value, generating, and based on the natural language query, one or more contextual features and one or more expressional features that may be input to the machine learning model, modifying at least one or the one or more contextual features and the one or more expressional features based on the feature dropout value to generate a set of input features for the machine learning model, and processing the set of input features to cause generating an output dataset for corresponding to the natural language query.

43.

发明公开
ENHANCED LOGITS FOR NATURAL LANGUAGE PROCESSING 审中-公开

公开(公告)号：US20240232541A1

公开(公告)日：2024-07-11

申请号：US18611039

申请日：2024-03-20

Applicant: Oracle International Corporation

Inventor： Ying Xu , Poorya Zaremoodi , Thanh Tien Vu , Cong Duy Vu Hoang , Vladislav Blinov , Yu-Heng Hong , Yakupitiyage Don Thanuja Samodhye Dharmasiri , Vishal Vishnoi , Elias Luqman Jalaluddin , Manish Parekh , Thanh Long Duong , Mark Edward Johnson

IPC: G06F40/35 , G06F40/205 , G06F40/253 , G06N3/08 , H04L51/02

CPC classification number: G06F40/35 , G06N3/08 , H04L51/02 , G06F40/205 , G06F40/253

Abstract: Techniques for using enhanced logit values for classifying utterances and messages input to chatbot systems in natural language processing. A method can include a chatbot system receiving an utterance generated by a user interacting with the chatbot system and inputting the utterance into a machine-learning model including a series of network layers. A final network layer of the series of network layers can include a logit function. The machine-learning model can map a first probability for a resolvable class to a first logit value using the logit function. The machine-learning model can map a second probability for a unresolvable class to an enhanced logit value. The method can also include the chatbot system classifying the utterance as the resolvable class or the unresolvable class based on the first logit value and the enhanced logit value.

44.

发明授权
Techniques for out-of-domain (OOD) detection 有权

公开(公告)号：US12014146B2

公开(公告)日：2024-06-18

申请号：US18364298

申请日：2023-08-02

Applicant: Oracle International Corporation

Inventor： Thanh Long Duong , Mark Edward Johnson , Vishal Vishnoi , Crystal C. Pan , Vladislav Blinov , Cong Duy Vu Hoang , Elias Luqman Jalaluddin , Duy Vu , Balakota Srinivas Vinnakota

IPC: G06F40/30 , G06F40/205 , G06F40/289 , G06N20/00 , H04L51/02

CPC classification number: G06F40/30 , G06F40/289 , G06N20/00 , H04L51/02 , G06F40/205

Abstract: The present disclosure relates to techniques for identifying out-of-domain utterances. One particular technique includes receiving an utterance and a target domain of a chatbot, generating a sentence embedding for the utterance, obtaining an embedding representation for each cluster of in-domain utterances associated with the target domain, predicting, using a metric learning model, a first probability that the utterance belongs to the target domain based on a similarity or difference between the sentence embedding and each embedding representation for each cluster, predicting, using an outlier detection model, a second probability that the utterance belongs to the target domain based on a determined distance or density deviation between the sentence embedding and embedding representations for neighboring clusters, evaluating the first probability and the second probability to determine a final probability, and classifying the utterance as in-domain or out-of-domain for the chatbot based on the final probability.

45.

发明公开
ADDRESSING CATASTROPHIC FORGETTING AND OVER-GENERALIZATION WHILE TRAINING A NATURAL LANGUAGE TO A MEANING REPRESENTATION LANGUAGE SYSTEM 审中-公开

公开(公告)号：US20240062044A1

公开(公告)日：2024-02-22

申请号：US18451995

申请日：2023-08-18

Applicant: Oracle International Corporation

Inventor： Shivashankar Subramanian , Dalu Guo , Gioacchino Tangari , Nitika Mathur , Cong Duy Vu Hoang , Mark Edward Johnson , Thanh Long Duong

IPC: G06N3/0455 , G06F40/58 , G06N3/006 , G06N3/084

CPC classification number: G06N3/0455 , G06F40/58 , G06N3/006 , G06N3/084

Abstract: Techniques are disclosed herein for addressing catastrophic forgetting and over-generalization while training a model to transform natural language to a logical form such as a meaning representation language. The techniques include accessing training data comprising natural language examples, augmenting the training data to generate expanded training data, training a machine learning model on the expanded training data, and providing the trained machine learning model. The augmenting includes (i) generating contrastive examples by revising natural language of examples identified to have caused regression during training of a machine learning model with the training data, (ii) generating alternative examples by modifying operators of examples identified within the training data that belong to a concept that exhibits bias, or (iii) a combination of (i) and (ii).

46.

发明公开
TECHNIQUES FOR USING NAMED ENTITY RECOGNITION TO RESOLVE ENTITY EXPRESSION IN TRANSFORMING NATURAL LANGUAGE TO A MEANING REPRESENTATION LANGUAGE 审中-公开

公开(公告)号：US20240062011A1

公开(公告)日：2024-02-22

申请号：US18351680

申请日：2023-07-13

Applicant: Oracle International Corporation

Inventor： Aashna Devang Kanuga , Cong Duy Vu Hoang , Mark Edward Johnson , Vasisht Raghavendra , Yuanxu Wu , Steve Wai-Chun Siu , Nitika Mathur , Gioacchino Tangari , Shubham Pawankumar Shah , Vanshika Sridharan , Zikai Li , Diego Andres Cornejo Barra , Stephen Andrew McRitchie , Christopher Mark Broadbent , Vishal Vishnoi , Srinivasa Phani Kumar Gadde , Poorya Zaremoodi , Thanh Long Duong , Bhagya Gayathri Hettige , Tuyen Quang Pham , Arash Shamaei , Thanh Tien Vu , Yakupitiyage Don Thanuja Samodhve Dharmasiri

IPC: G06F40/295 , G06F40/284 , G06F40/211 , G06F40/35

CPC classification number: G06F40/295 , G06F40/284 , G06F40/211 , G06F40/35

Abstract: Techniques are disclosed herein for using named entity recognition to resolve entity expression while transforming natural language to a meaning representation language. In one aspect, a method includes accessing natural language text, predicting, by a first machine learning model, a class label for a token in the natural language text, predicting, by a second machine-learning model, operators for a meaning representation language and a value or value span for each attribute of the operators, in response to determining that the value or value span for a particular attribute matches the class label, converting a portion of the natural language text for the value or value span into a resolved format, and outputting syntax for the meaning representation language. The syntax comprises the operators with the portion of the natural language text for the value or value span in the resolved format.

47.

发明授权
Techniques for out-of-domain (OOD) detection 有权

公开(公告)号：US11763092B2

公开(公告)日：2023-09-19

申请号：US17217909

申请日：2021-03-30

Applicant: Oracle International Corporation

Inventor： Thanh Long Duong , Mark Edward Johnson , Vishal Vishnoi , Crystal C. Pan , Vladislav Blinov , Cong Duy Vu Hoang , Elias Luqman Jalaluddin , Duy Vu , Balakota Srinivas Vinnakota

IPC: G06F40/30 , G06N20/00 , G06F40/289 , H04L51/02 , G06F40/205

CPC classification number: G06F40/30 , G06F40/289 , G06N20/00 , H04L51/02 , G06F40/205

Abstract: The present disclosure relates to techniques for identifying out-of-domain utterances. One particular technique includes receiving an utterance and a target domain of a chatbot, generating a sentence embedding for the utterance, obtaining an embedding representation for each cluster of in-domain utterances associated with the target domain, predicting, using a metric learning model, a first probability that the utterance belongs to the target domain based on a similarity or difference between the sentence embedding and each embedding representation for each cluster, predicting, using an outlier detection model, a second probability that the utterance belongs to the target domain based on a determined distance or density deviation between the sentence embedding and embedding representations for neighboring clusters, evaluating the first probability and the second probability to determine a final probability, and classifying the utterance as in-domain or out-of-domain for the chatbot based on the final probability.

48.

发明公开
GAZETTEER INTEGRATION FOR NEURAL NAMED ENTITY RECOGNITION 审中-公开

公开(公告)号：US20230205999A1

公开(公告)日：2023-06-29

申请号：US18087629

申请日：2022-12-22

Applicant: Oracle International Corporation

Inventor： Tuyen Quang Pham , Cong Duy Vu Hoang , Mark Edward Johnson , Thanh Long Duong

IPC: G06F40/295 , G06F40/284 , G06F40/205

CPC classification number: G06F40/295 , G06F40/284 , G06F40/205

Abstract: Techniques are provided for named entity recognition using a gazetteer incorporated with a neural network. An utterance is received from a user. The utterance is input into a neural network comprising model parameters learned for named entity recognition. The neural network generates a first representation of one or more named entities based on the utterance. A gazetteer is searched based on the input utterance to generate a second representation of one or more named entities identified in the utterance. The first named entity representation is combined with the second named entity representation to generate a combined named entity representation. The combined named entity representation is output for facilitating a response to the user.

49.

发明公开
DATA MANUFACTURING FRAMEWORKS FOR SYNTHESIZING SYNTHETIC TRAINING DATA TO FACILITATE TRAINING A NATURAL LANGUAGE TO LOGICAL FORM MODEL 审中-公开

公开(公告)号：US20230185834A1

公开(公告)日：2023-06-15

申请号：US18065434

申请日：2022-12-13

Applicant: Oracle International Corporation

Inventor： Philip Arthur , Vishal Vishnoi , Mark Edward Johnson , Thanh Long Duong , Srinivasa Phani Kumar Gadde , Balakota Srinivas Vinnakota , Cong Duy Vu Hoang , Steve Wai-Chun Siu , Nitika Mathur , Gioacchino Tangari , Aashna Devang Kanuga

IPC: G06F16/332 , G06N20/00 , G06F40/47 , G06F40/211 , G06F40/237 , G06F40/284

CPC classification number: G06F16/3329 , G06N20/00 , G06F40/47 , G06F40/211 , G06F40/237 , G06F40/284 , G06F40/35

Abstract: Techniques are disclosed herein for synthesizing synthetic training data to facilitate training a natural language to logical form model. In one aspect, training data can be synthesized from original under a framework based on templates and a synchronous context-free grammar. In one aspect, training data can be synthesized under a framework based on a probabilistic context-free grammar and a translator. In one aspect, training data can be synthesized under a framework based on tree-to-string translation. In one aspect, the synthetic training data can be combined with original training data in order to train a machine learning model to translate an utterance to a logical form.

50.

发明公开
SYSTEM AND TECHNIQUES FOR HANDLING LONG TEXT FOR PRE-TRAINED LANGUAGE MODELS 审中-公开

公开(公告)号：US20230161963A1

公开(公告)日：2023-05-25

申请号：US17750240

申请日：2022-05-20

Applicant: Oracle International Corporation

Inventor： Thanh Tien Vu , Tuyen Quang Pham , Mark Edward Johnson , Thanh Long Duong , Ying Xu , Poorya Zaremoodi , Omid Mohamad Nezami , Budhaditya Saha , Cong Duy Vu Hoang

IPC: G06F40/295 , G06F40/284 , G06F40/169

CPC classification number: G06F40/295 , G06F40/284 , G06F40/169

Abstract: In some aspects, a computing device may receive, at a data processing system, a set of utterances for training or inferencing with a named entity recognizer to assign a label to each token piece from the set of utterances. The computing device may determine a length of each utterance in the set and when the length of the utterance exceeds a pre-determined threshold of token pieces: dividing the utterance into a plurality of overlapping chunks of token pieces; assigning a label together with a confidence score for each token piece in a chunk; determining a final label and an associated confidence score for each chunk of token pieces by merging two confidence scores; determining a final annotated label for the utterance based at least on the merging the two confidence scores; and storing the final annotated label in a memory.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification