-
公开(公告)号:US20250117585A1
公开(公告)日:2025-04-10
申请号:US18987825
申请日:2024-12-19
Applicant: Oracle International Corporation
Inventor: Thanh Tien Vu , Tuyen Quang Pham , Mark Edward Johnson , Thanh Long Duong , Ying Xu , Poorya Zaremoodi , Omid Mohamad Nezami , Budhaditya Saha , Cong Duy Vu Hoang
IPC: G06F40/295 , G06F40/284 , H04L51/02
Abstract: In some aspects, a computing device may receive, at a data processing system, a set of utterances for training or inferencing with a named entity recognizer to assign a label to each token piece from the set of utterances. The computing device may determine a length of each utterance in the set and when the length of the utterance exceeds a pre-determined threshold of token pieces: dividing the utterance into a plurality of overlapping chunks of token pieces; assigning a label together with a confidence score for each token piece in a chunk; determining a final label and an associated confidence score for each chunk of token pieces by merging two confidence scores; determining a final annotated label for the utterance based at least on the merging the two confidence scores; and storing the final annotated label in a memory.
-
公开(公告)号:US12217497B2
公开(公告)日:2025-02-04
申请号:US17888300
申请日:2022-08-15
Applicant: Oracle International Corporation
Inventor: Yakupitiyage Don Thanuja Samodhye Dharmasiri , Xu Zhong , Ahmed Ataallah Ataallah Abobakr , Hongtao Yang , Budhaditya Saha , Shaoke Xu , Shashi Prasad Suravarapu , Mark Edward Johnson , Thanh Long Duong
IPC: G06V10/82 , G06V30/148 , G06V30/412
Abstract: Techniques for extracting key information from a document using machine-learning models in a chatbot system is disclosed herein. In one particular aspect, a method is provided that includes receiving a set of data, which includes key fields, within a document at a data processing system that includes a table detection module, a key information extraction module, and a table extraction module. Text information and corresponding location data are extracted via optical character recognition. The table detection module detects whether one or more tables are present in the document and, if applicable, a location of each of the tables. The key information extraction module extracts text from the key fields. The table extraction module extracts each of the tables based on input from the optical character recognition and the table detection module. Extraction results include the text from the key fields and each of the tables can be output.
-
公开(公告)号:US12210973B2
公开(公告)日:2025-01-28
申请号:US16938098
申请日:2020-07-24
Applicant: Oracle International Corporation
Inventor: Mark Edward Johnson
IPC: G06N3/082 , G06F40/205 , G06F40/295 , G06F40/30 , G06N3/045
Abstract: A model for a natural language understanding task is generated based on labeled data generated by a labeling model. The model for the natural language understanding task is smaller than the labeling model (i.e., with lower computational and memory requirements than the combined model), but with substantially the same performance as the labeling model. In some cases, the labeling model may be generated based on a large pre-trained model.
-
公开(公告)号:US20240419910A1
公开(公告)日:2024-12-19
申请号:US18819441
申请日:2024-08-29
Applicant: Oracle International Corporation
Inventor: Thanh Long Duong , Vishal Vishnoi , Mark Edward Johnson , Elias Luqman Jalaluddin , Tuyen Quang Pham , Cong Duy Vu Hoang , Poorya Zaremoodi , Srinivasa Phani Kumar Gadde , Aashna Devang Kanuga , Zikai Li , Yuanxu Wu
IPC: G06F40/289 , G06F40/166 , G06F40/205 , G06F40/263 , G06F40/279 , G06F40/295 , G06N3/08 , H04L51/02
Abstract: A method includes receiving an indication of a first coverage value corresponding to a desired overlap between a dataset of natural language phrases and a training dataset for training a machine learning model; determining a second coverage value corresponding to a measured overlap between the dataset of natural language phrases and the training dataset; determining a coverage delta value based on a comparison between the first coverage value and the second coverage value; modifying, based on the coverage delta value, the dataset of natural language phrases; and processing, utilizing a machine learning model including the modified dataset of natural language phrases, an input dataset including a set of input features. The machine learning model processes the input dataset based at least in part on the dataset of natural language phrases to generate an output dataset.
-
公开(公告)号:US20240232187A9
公开(公告)日:2024-07-11
申请号:US18321144
申请日:2023-05-22
Applicant: Oracle International Corporation
Inventor: Chang Xu , Poorya Zaremoodi , Cong Duy Vu Hoang , Nitika Mathur , Philip Arthur , Steve Wai-Chun Siu , Aashna Devang Kanuga , Gioacchino Tangari , Mark Edward Johnson , Thanh Long Duong , Vishal Vishnoi , Stephen Andrew McRitchie , Christopher Mark Broadbent
IPC: G06F16/2452 , G06F40/211 , G06F40/30
CPC classification number: G06F16/24522 , G06F40/211 , G06F40/30
Abstract: The present disclosure is related to techniques for converting a natural language utterance to a logical form query and deriving a natural language interpretation of the logical form query. The techniques include accessing a Meaning Resource Language (MRL) query and converting the MRL query into a MRL structure including logical form statements. The converting includes extracting operations and associated attributes from the MRL query and generating the logical form statements from the operations and associated attributes. The techniques further include translating each of the logical form statements into a natural language expression based on a grammar data structure that includes a set of rules for translating logical form statements into corresponding natural language expressions, combining the natural language expressions into a single natural language expression, and providing the single natural language expression as an interpretation of the natural language utterance.
-
公开(公告)号:US20240143934A1
公开(公告)日:2024-05-02
申请号:US18485700
申请日:2023-10-12
Applicant: Oracle International Corporation
Inventor: Poorya Zaremoodi , Duy Vu , Nagaraj N. Bhat , Srijon Sarkar , Varsha Kuppur Rajendra , Thanh Long Duong , Mark Edward Johnson , Pramir Sarkar , Shahid Reza
IPC: G06F40/30 , G06F40/284 , G06F40/289
CPC classification number: G06F40/30 , G06F40/284 , G06F40/289
Abstract: A method includes accessing document including sentences, document being associated with configuration flag indicating whether ABSA, SLSA, or both are to be performed; inputting the document into language model that generates chunks of token embeddings for the document; and, based on the configuration flag, performing at least one from among the ABSA and the SLSA by inputting the chunks of token embeddings into a multi-task model. When performing the SLSA, a part of token embeddings in each of the chunks is masked, and the masked token embeddings do not belong to a particular sentence on which the SLSA is performed.
-
公开(公告)号:US20240126999A1
公开(公告)日:2024-04-18
申请号:US18545621
申请日:2023-12-19
Applicant: Oracle International Corporation
Inventor: Ying Xu , Poorya Zaremoodi , Thanh Tien Vu , Cong Duy Vu Hoang , Vladislav Blinov , Yu-Heng Hong , Yakupitiyage Don Thanuja Samodhye Dharmasiri , Vishal Vishnoi , Elias Luqman Jalaluddin , Manish Parekh , Thanh Long Duong , Mark Edward Johnson
CPC classification number: G06F40/35 , G06N20/00 , H04L51/02 , G06F40/253
Abstract: Techniques for using logit values for classifying utterances and messages input to chatbot systems in natural language processing. A method can include a chatbot system receiving an utterance generated by a user interacting with the chatbot system. The chatbot system can input the utterance into a machine-learning model including a set of binary classifiers. Each binary classifier of the set of binary classifiers can be associated with a modified logit function. The method can also include the machine-learning model using the modified logit function to generate a set of distance-based logit values for the utterance. The method can also include the machine-learning model applying an enhanced activation function to the set of distance-based logit values to generate a predicted output. The method can also include the chatbot system classifying, based on the predicted output, the utterance as being associated with the particular class.
-
公开(公告)号:US20240095584A1
公开(公告)日:2024-03-21
申请号:US18197224
申请日:2023-05-15
Applicant: Oracle International Corporation
Inventor: Ying Xu , Vladislav Blinov , Ahmed Ataallah Ataallah Abobakr , Thanh Long Duong , Mark Edward Johnson , Elias Luqman Jalaluddin , Xin Xu , Srinivasa Phani Kumar Gadde , Vishal Vishnoi , Poorya Zaremoodi , Umanga Bista
IPC: G06N20/00
CPC classification number: G06N20/00
Abstract: Techniques are disclosed herein for objective function optimization in target based hyperparameter tuning. In one aspect, a computer-implemented method is provided that includes initializing a machine learning algorithm with a set of hyperparameter values and obtaining a hyperparameter objective function that comprises a domain score for each domain that is calculated based on a number of instances within an evaluation dataset that are correctly or incorrectly predicted by the machine learning algorithm during a given trial. For each trial of a hyperparameter tuning process: training the machine learning algorithm to generate a machine learning model, running the machine learning model in different domains using the set of hyperparameter values, evaluating the machine learning model for each domain, and once the machine learning model has reached convergence, outputting at least one machine learning model.
-
公开(公告)号:US20240095454A1
公开(公告)日:2024-03-21
申请号:US18521805
申请日:2023-11-28
Applicant: Oracle International Corporation
Inventor: Duy Vu , Tuyen Quang Pham , Cong Duy Vu Hoang , Srinivasa Phani Kumar Gadde , Thanh Long Duong , Mark Edward Johnson , Vishal Vishnoi
IPC: G06F40/295 , G06F40/205 , G06F40/279 , G06F40/35 , G06F40/40 , G06V30/19
CPC classification number: G06F40/295 , G06F40/205 , G06F40/279 , G06F40/35 , G06F40/40 , G06V30/19147
Abstract: Techniques are provided for using context tags in named-entity recognition (NER) models. In one particular aspect, a method is provided that includes receiving an utterance, generating embeddings for words of the utterance, generating a regular expression and gazetteer feature vector for the utterance, generating a context tag distribution feature vector for the utterance, concatenating or interpolating the embeddings with the regular expression and gazetteer feature vector and the context tag distribution feature vector to generate a set of feature vectors, generating an encoded form of the utterance based on the set of feature vectors, generating log-probabilities based on the encoded form of the utterance, and identifying one or more constraints for the utterance.
-
公开(公告)号:US20240013780A1
公开(公告)日:2024-01-11
申请号:US18471491
申请日:2023-09-21
Applicant: Oracle International Corporation
Inventor: Srinivasa Phani Kumar Gadde , Yuanxu Wu , Aashna Devang Kanuga , Elias Luqman Jalaluddin , Vishal Vishnoi , Mark Edward Johnson
IPC: G10L15/197 , G10L15/06 , G10L15/26 , H04L51/02 , H04L51/52 , G06F40/186 , G06F40/295 , G06F40/30 , G06N20/00 , G06F40/35
CPC classification number: G10L15/197 , G10L15/063 , G10L15/26 , H04L51/02 , H04L51/52 , G06F40/186 , G06F40/295 , G06F40/30 , G06N20/00 , G06F40/35 , G10L2015/0631 , G06N3/044
Abstract: Techniques for data augmentation for training chatbot systems in natural language processing. In one particular aspect, a method is provided that includes generating a list of values to cover for an entity, selecting utterances from a set of data that have context for the entity, converting the utterances into templates, where each template of the templates comprises a slot that maps to the list of values for the entity, selecting a template from the templates, selecting a value from the list of values based on the mapping between the slot within the selected template and the list of values for the entity; and creating an artificial utterance based on the selected template and the selected value, where the creating the artificial utterance comprises inserting the selected value into the slot of the selected template that maps to the list of values for the entity.
-
-
-
-
-
-
-
-
-