-
公开(公告)号:US11972220B2
公开(公告)日:2024-04-30
申请号:US17456687
申请日:2021-11-29
Applicant: Oracle International Corporation
Inventor: Ying Xu , Poorya Zaremoodi , Thanh Tien Vu , Cong Duy Vu Hoang , Vladislav Blinov , Yu-Heng Hong , Yakupitiyage Don Thanuja Samodhye Dharmasiri , Vishal Vishnoi , Elias Luqman Jalaluddin , Manish Parekh , Thanh Long Duong , Mark Edward Johnson
IPC: G06F40/35 , G06F40/205 , G06F40/253 , G06N3/08 , H04L51/02
CPC classification number: G06F40/35 , G06N3/08 , H04L51/02 , G06F40/205 , G06F40/253
Abstract: Techniques for using enhanced logit values for classifying utterances and messages input to chatbot systems in natural language processing. A method can include a chatbot system receiving an utterance generated by a user interacting with the chatbot system and inputting the utterance into a machine-learning model including a series of network layers. A final network layer of the series of network layers can include a logit function. The machine-learning model can map a first probability for a resolvable class to a first logit value using the logit function. The machine-learning model can map a second probability for a unresolvable class to an enhanced logit value. The method can also include the chatbot system classifying the utterance as the resolvable class or the unresolvable class based on the first logit value and the enhanced logit value.
-
12.
公开(公告)号:US20240061833A1
公开(公告)日:2024-02-22
申请号:US18218385
申请日:2023-07-05
Applicant: Oracle International Corporation
Inventor: Gioacchino Tangari , Nitika Mathur , Philip Arthur , Cong Duy Vu Hoang , Aashna Devang Kanuga , Steve Wai-Chun Siu , Syed Najam Abbas Zaidi , Poorya Zaremoodi , Thanh Long Duong , Mark Edward Johnson
IPC: G06F16/2452 , G06F16/242 , G06F40/247 , G06F40/284
CPC classification number: G06F16/24522 , G06F16/243 , G06F40/247 , G06F40/284
Abstract: Techniques are disclosed for augmenting training data for training a machine learning model to generate database queries. Training data comprising a first training example comprising a first natural language utterance, a logical form for the first natural language utterance, and associated first metadata is obtained. From the first training example, a template utterance is generated. A second natural language utterance is generated by filling slots in the template utterance based on a database schema and database values. Updated metadata is produced based on the first metadata and the second natural language utterance. A second training example is generated, comprising the second natural language utterance, the logical form for the first natural language utterance, and the updated metadata. The training data is augmented by adding the second training example. A machine learning model is trained to generate a database query comprising the database operation using the augmented training data set.
-
公开(公告)号:US20230206125A1
公开(公告)日:2023-06-29
申请号:US18087647
申请日:2022-12-22
Applicant: Oracle International Corporation
Inventor: Tuyen Quang Pham , Cong Duy Vu Hoang , Thanh Tien Vu , Mark Edward Johnson , Thanh Long Duong
IPC: G06N20/00 , G06F40/35 , G06F40/284 , G06F40/295 , G06F40/253
CPC classification number: G06N20/00 , G06F40/35 , G06F40/284 , G06F40/295 , G06F40/253 , G06F40/205
Abstract: Techniques are provided for improved training of a machine learning model using lexical dropout. A machine learning model and a training data set are accessed. The training data set can include sample utterances and corresponding labels. A dropout parameter is identified. The dropout parameter can indicate a likelihood for dropping out one or more feature vectors for tokens associated with respective entities during training of the machine learning model. The dropout parameter is applied to feature vectors for tokens associated with respective entities. The machine learning model is trained using the training data set and the dropout parameter to generate a trained machine learning model. The use of the trained the machine learning model is facilitated.
-
14.
公开(公告)号:US20230186025A1
公开(公告)日:2023-06-15
申请号:US18065387
申请日:2022-12-13
Applicant: Oracle International Corporation
Inventor: Jae Min John , Vishal Vishnoi , Mark Edward Johnson , Thanh Long Duong , Srinivasa Phani Kumar Gadde , Balakota Srinivas Vinnakota , Shivashankar Subramanian , Cong Duy Vu Hoang , Yakupitiyage Don Thanuja Samodhye Dharmasiri , Nitika Mathur , Aashna Devang Kanuga , Philip Arthur , Gioacchino Tangari , Steve Wai-Chun Siu
IPC: G06F40/284 , G06F40/295 , G06F40/42
CPC classification number: G06F40/284 , G06F40/295 , G06F40/42
Abstract: Techniques for preprocessing data assets to be used in a natural language to logical form model based on scalable search and content-based schema linking. In one particular aspect, a method includes accessing an utterance, classifying named entities within the utterance into predefined classes, searching value lists within the database schema using tokens from the utterance to identify and output value matches including: (i) any value within the value lists that matches a token from the utterance and (ii) any attribute associated with a matching value, generating a data structure by organizing and storing: (i) each of the named entities and an assigned class for each of the named entities, (ii) each of the value matches and the token matching each of the value matches, and (iii) the utterance, in a predefined format for the data structure, and outputting the data structure.
-
公开(公告)号:US20210304074A1
公开(公告)日:2021-09-30
申请号:US17216498
申请日:2021-03-29
Applicant: Oracle International Corporation
Inventor: Poorya Zaremoodi , Ying Xu , Thanh Tien Vu , Vladislav Blinov , Yu-Heng Hong , Yakupitiyage Don Thanuja Samodhye Dharmasiri , Vishal Vishnoi , Elias Luqman Jalaluddin , Manish Parekh , Thanh Long Duong , Mark Edward Johnson , Xin Xu , Cong Duy Vu Hoang
IPC: G06N20/00
Abstract: Techniques are disclosed for tuning hyperparameters of a machine-learning model. A plurality of metrics are selected for which hyperparameters of the machine-learning model are to be tuned. Each metric is associated with a plurality of specification parameters including a target score, a penalty factor, and a bonus factor. The plurality of specification parameters are configured for each metric in accordance with a first criterion. The machine-learning model is evaluated using one or more validation datasets to obtain a metric score. A weighted loss function is formulated based on a difference between the metric score and the target score of each metric, the penalty factor or the bonus factor. The hyperparameters associated with the machine-learning model are tuned in order to optimize the weighted loss function. In response to the weighted loss function being optimized, the machine-learning model is provided as a validated machine-learning model.
-
公开(公告)号:US20250117591A1
公开(公告)日:2025-04-10
申请号:US18988114
申请日:2024-12-19
Applicant: Oracle International Corporation
Inventor: Ying XU , Poorya Zaremoodi , Thanh Tien Vu , Cong Duy Vu Hoang , Vladislav Blinov , Yu-Heng Hong , Yakupitiyage Don Thanuja Samodhye Dharmasiri , Vishal Vishnoi , Elias Luqman Jalaluddin , Manish Parekh , Thanh Long Duong , Mark Edward Johnson
IPC: G06F40/35 , G06F40/205 , G06F40/253 , G06N20/00 , H04L51/02
Abstract: Techniques for using logit values for classifying utterances and messages input to chatbot systems in natural language processing. A method can include a chatbot system receiving an utterance generated by a user interacting with the chatbot system. The chatbot system can input the utterance into a machine-learning model including a set of binary classifiers. Each binary classifier of the set of binary classifiers can be associated with a modified logit function. The method can also include the machine-learning model using the modified logit function to generate a set of distance-based logit values for the utterance. The method can also include the machine-learning model applying an enhanced activation function to the set of distance-based logit values to generate a predicted output. The method can also include the chatbot system classifying, based on the predicted output, the utterance as being associated with the particular class.
-
公开(公告)号:US20250117585A1
公开(公告)日:2025-04-10
申请号:US18987825
申请日:2024-12-19
Applicant: Oracle International Corporation
Inventor: Thanh Tien Vu , Tuyen Quang Pham , Mark Edward Johnson , Thanh Long Duong , Ying Xu , Poorya Zaremoodi , Omid Mohamad Nezami , Budhaditya Saha , Cong Duy Vu Hoang
IPC: G06F40/295 , G06F40/284 , H04L51/02
Abstract: In some aspects, a computing device may receive, at a data processing system, a set of utterances for training or inferencing with a named entity recognizer to assign a label to each token piece from the set of utterances. The computing device may determine a length of each utterance in the set and when the length of the utterance exceeds a pre-determined threshold of token pieces: dividing the utterance into a plurality of overlapping chunks of token pieces; assigning a label together with a confidence score for each token piece in a chunk; determining a final label and an associated confidence score for each chunk of token pieces by merging two confidence scores; determining a final annotated label for the utterance based at least on the merging the two confidence scores; and storing the final annotated label in a memory.
-
公开(公告)号:US20240419910A1
公开(公告)日:2024-12-19
申请号:US18819441
申请日:2024-08-29
Applicant: Oracle International Corporation
Inventor: Thanh Long Duong , Vishal Vishnoi , Mark Edward Johnson , Elias Luqman Jalaluddin , Tuyen Quang Pham , Cong Duy Vu Hoang , Poorya Zaremoodi , Srinivasa Phani Kumar Gadde , Aashna Devang Kanuga , Zikai Li , Yuanxu Wu
IPC: G06F40/289 , G06F40/166 , G06F40/205 , G06F40/263 , G06F40/279 , G06F40/295 , G06N3/08 , H04L51/02
Abstract: A method includes receiving an indication of a first coverage value corresponding to a desired overlap between a dataset of natural language phrases and a training dataset for training a machine learning model; determining a second coverage value corresponding to a measured overlap between the dataset of natural language phrases and the training dataset; determining a coverage delta value based on a comparison between the first coverage value and the second coverage value; modifying, based on the coverage delta value, the dataset of natural language phrases; and processing, utilizing a machine learning model including the modified dataset of natural language phrases, an input dataset including a set of input features. The machine learning model processes the input dataset based at least in part on the dataset of natural language phrases to generate an output dataset.
-
公开(公告)号:US20240232187A9
公开(公告)日:2024-07-11
申请号:US18321144
申请日:2023-05-22
Applicant: Oracle International Corporation
Inventor: Chang Xu , Poorya Zaremoodi , Cong Duy Vu Hoang , Nitika Mathur , Philip Arthur , Steve Wai-Chun Siu , Aashna Devang Kanuga , Gioacchino Tangari , Mark Edward Johnson , Thanh Long Duong , Vishal Vishnoi , Stephen Andrew McRitchie , Christopher Mark Broadbent
IPC: G06F16/2452 , G06F40/211 , G06F40/30
CPC classification number: G06F16/24522 , G06F40/211 , G06F40/30
Abstract: The present disclosure is related to techniques for converting a natural language utterance to a logical form query and deriving a natural language interpretation of the logical form query. The techniques include accessing a Meaning Resource Language (MRL) query and converting the MRL query into a MRL structure including logical form statements. The converting includes extracting operations and associated attributes from the MRL query and generating the logical form statements from the operations and associated attributes. The techniques further include translating each of the logical form statements into a natural language expression based on a grammar data structure that includes a set of rules for translating logical form statements into corresponding natural language expressions, combining the natural language expressions into a single natural language expression, and providing the single natural language expression as an interpretation of the natural language utterance.
-
公开(公告)号:US20240126999A1
公开(公告)日:2024-04-18
申请号:US18545621
申请日:2023-12-19
Applicant: Oracle International Corporation
Inventor: Ying Xu , Poorya Zaremoodi , Thanh Tien Vu , Cong Duy Vu Hoang , Vladislav Blinov , Yu-Heng Hong , Yakupitiyage Don Thanuja Samodhye Dharmasiri , Vishal Vishnoi , Elias Luqman Jalaluddin , Manish Parekh , Thanh Long Duong , Mark Edward Johnson
CPC classification number: G06F40/35 , G06N20/00 , H04L51/02 , G06F40/253
Abstract: Techniques for using logit values for classifying utterances and messages input to chatbot systems in natural language processing. A method can include a chatbot system receiving an utterance generated by a user interacting with the chatbot system. The chatbot system can input the utterance into a machine-learning model including a set of binary classifiers. Each binary classifier of the set of binary classifiers can be associated with a modified logit function. The method can also include the machine-learning model using the modified logit function to generate a set of distance-based logit values for the utterance. The method can also include the machine-learning model applying an enhanced activation function to the set of distance-based logit values to generate a predicted output. The method can also include the chatbot system classifying, based on the predicted output, the utterance as being associated with the particular class.
-
-
-
-
-
-
-
-
-