Patent search ap:("Oracle International Corporation") AND inv:"Jun Qian" Page 1

1.

发明申请
ENSURING THAT LANGUAGE MODELS FOLLOW INSTRUCTIONS INDICATED IN PROMPTS 有权

公开(公告)号：US20250094865A1

公开(公告)日：2025-03-20

申请号：US18629917

申请日：2024-04-08

Applicant: Oracle International Corporation

Inventor： Zheng Wang , Yazhe Hu , Mengqing Guo , Tao Sheng , Jun Qian , Vinod M. Mamtani

IPC: G06N20/00 , G06F40/40

Abstract: Techniques for ensuring that language models follow instructions indicated in prompts are provided. In one technique, a first language model generates a response based on a prompt. A set of instructions in the prompt is identified. For each instruction in the set, a second language model determines whether the response indicates that the first language model followed the instruction. In another technique, for each prompt of a plurality of prompts: (1) a first language model generates a response based on the prompt; (2) multiple instructions are identified based on the prompt; (3) a second language model generates, based on the plurality of instructions, an output that indicates that the first language model followed each instruction; and (4) the prompt, the response, and the multiple instructions are stored in a training instance. The first language model is finetuned based on the training instances.

2.

发明申请
LANGUAGE MODEL SUMMARIZATION USING SEMANTICAL CLUSTERING 有权

公开(公告)号：US20250094716A1

公开(公告)日：2025-03-20

申请号：US18657308

申请日：2024-05-07

Applicant: Oracle International Corporation

Inventor： Zheng Wang , Yazhe Hu , Mengqing Guo , Tao Sheng , Jun Qian , Vinod M. Mamtani

IPC: G06F40/289 , G06F40/216

Abstract: Techniques for language model (LM) summarization using semantical clustering are provided. In one technique, a plurality of concepts reflected in text data is identified. A plurality of concept clusters is generated based on similarity among the plurality of concepts. Thus, some concept clusters may include multiple concepts. For each concept cluster of the plurality of concept clusters, an LM generates a summary of the text corresponding to that concept cluster. A summary response of the text data is generated by aggregating the summary of each concept cluster of the plurality of concept clusters. In another technique, an LM generates a summary based on text data. A first set of concepts reflected in the summary is identified and a second set of concepts reflected in the text data is identified. A difference between the two sets may indicate that the summary is missing one or more concepts.

3.

发明申请
APPLICATION PERFORMANCE MONITORING FOR MONOLITHIC APPLICATIONS AND DISTRIBUTED SYSTEMS 有权

公开(公告)号：US20240403197A1

公开(公告)日：2024-12-05

申请号：US18806468

申请日：2024-08-15

Applicant: Oracle International Corporation

Inventor： Fuheng Wu , Ivan Dimitrov Davchev , Jun Qian

IPC: G06F11/36

Abstract: A computing device may access a target code for implementing an application. The device may identify addresses for one or more functions or one or more variables associated with the target code. The device may generate an interval tree comprising a root node and one or more function nodes. The device may in response to the target code invoking a function or variable: generate an intercept function configured to intercept communication between the target code and a call address for the at least one of the one or more functions or the one or more variables invoked by the target code. The device may intercept data communicated between the target code and the call address. The device may store the intercepted data as a function node in the interval tree. The device may transmit the interval tree to a user device.

4.

发明公开
SYNTHETIC DATA FINE-TUNED OPTICAL CHARACTER RECOGNITION ENGINE FOR EXTENSIBLE MARKUP LANGUAGE DOCUMENT RECONSTRUCTION 审中-公开

公开(公告)号：US20240338958A1

公开(公告)日：2024-10-10

申请号：US18131744

申请日：2023-04-06

Applicant: Oracle International Corporation

Inventor： Liyu Gong , Yuying Wang , Mengqing Guo , Tao Sheng , Jun Qian

IPC: G06V30/19 , G06F40/143 , G06V10/70

CPC classification number: G06V30/19147 , G06F40/143 , G06V10/70

Abstract: Techniques are disclosed for optical character recognition of extensible markup language content. A method can include a system generating a first training data comprising extensible markup language (XML) content, the first training data comprising a first plurality of training instances, each training instance including a respective image comprising XML content and annotation information for the respective image. The system can train a plurality of machine learning models using the first training data to generate a plurality of trained machine learning models, to perform image-based XML content extraction. The system can generate a plurality of trained machine learning models based at least in part on the training.

5.

发明公开
SYNTHETIC TABLE GENERATION PIPELINE FOR TRAINING DEEP TABLE EXTRACTION MODELS 审中-公开

公开(公告)号：US20240273789A1

公开(公告)日：2024-08-15

申请号：US18467291

申请日：2023-09-14

Applicant: Oracle International Corporation

Inventor： Mohammadhossein Chaghazardi , Wenjing Yang , Tao Sheng , Jun Qian

IPC: G06T11/20 , G06F40/109 , G06T7/13 , G06T7/90 , G06V10/25 , G06V20/70

CPC classification number: G06T11/206 , G06F40/109 , G06T7/13 , G06T7/90 , G06V10/25 , G06V20/70 , G06T2207/10024

Abstract: Techniques are described for HTML-based image generation. An example, method can include generating hypertext markup language (HTML) code for a table comprising a table structure of a set of rows and columns. The method can further include generating HTML code for a text to populate a cell of the table. The method can further include generating a rendered image of the table using the HTML code. The method can further include detecting a first pixel of the rendered image comprising the first color, and a second pixel of the rendered image comprising the second color. The method can further include detecting the text on the rendered image. The method can further include generating a bounding box, surrounding the detected text. The method can further include generating annotation comprising a bounding box parameter and a text parameter.

6.

发明公开
TABLE EXTRACTION FROM IMAGE-BASED DOCUMENTS 审中-公开

公开(公告)号：US20230260309A1

公开(公告)日：2023-08-17

申请号：US17835813

申请日：2022-06-08

Applicant: Oracle International Corporation

Inventor： Yazdan Jamshidikhezeli , Iman Zadeh , Jun Qian

IPC: G06V30/414 , G06V30/10 , G06V30/19

CPC classification number: G06V30/414 , G06V30/10 , G06V30/19107

Abstract: Techniques are described for extracting tables and associated content from image-based documents and generating a machine-readable representation of a table. A system is described that executes an end-to-end pipeline for extracting one or more tables from an image-based documents and generating a machine-readable and editable table representation based upon the extracted contents. The processing may include using OCR techniques to extract text portions from an image-based document, identifying a region (table region) in the image-based document containing a table, identifying a subset of text portions that are located inside the table region, determining a number of rows and columns in the table to be generated, aligning the text portions and assigning row and column indices to the text portions, and generating a machine-readable table representation based upon the text portions.

7.

发明公开
TECHNIQUES FOR GRAPH DATA STRUCTURE AUGMENTATION 审中-公开

公开(公告)号：US20230146501A1

公开(公告)日：2023-05-11

申请号：US17524157

申请日：2021-11-11

Applicant: Oracle International Corporation

Inventor： Amit Agarwal , Kulbhushan Pachauri , Iman Zadeh , Jun Qian

IPC: G06K9/00 , G06N20/00

CPC classification number: G06K9/00442 , G06N20/00 , G06K2209/01

Abstract: A computing device may receive a set of user documents. Data may be extracted from the documents to generate a first graph data structure with one or more initial graphs containing key-value pairs. A model may be trained on the first graph data structure to classify the pairs. Until a set of evaluation metrics for the model exceeds a set of deployment thresholds: generating, a set of evaluation metrics may be generated for the model. The set of evaluation metrics may be compared to the set of deployment thresholds. In response to a determination that the set of evaluation metrics are below the set of deployment thresholds: one or more new graphs may be generated from the one or more initial graphs in the first graph data structure to produce a second graph data structure. The first and second graph can be used to train the model.

8.

发明申请
OFFICER-IN-THE-LOOP CRIME REPORT GENERATION USING LARGE LANGUAGE MODELS AND PROMPT ENGINEERING 有权

公开(公告)号：US20250095096A1

公开(公告)日：2025-03-20

申请号：US18885192

申请日：2024-09-13

Applicant: Oracle International Corporation

Inventor： Iman Zadeh , Christophe J. Gerard , Qiu Qin , Ziqun Ye , Aditya Banerjee , Jun Qian , Nicole E. Hess

IPC: G06Q50/26 , G06F40/197 , G06F40/40

Abstract: The present disclosure relates to utilizing large language models (LLMs) to facilitate generation of incident reports or similar documents. One or more initial inputs may be received from a user, and one or more example incident reports may be identified. The one or more example incident reports and the one or more initial inputs may be sent to an LLM. A reviewable version of an incident report may be accessed that is based on output that the LLM generated based on the example incident reports and the one or more initial inputs. The reviewable version of the incident report may be presented in a human readable format via a graphical user interface (GUI). A modification corresponding to the reviewable version of the incident report may be received via the GUI. The modification and the reviewable version of the incident report may be sent to the LLM to cause the LLM to generate an updated version of the incident report.

9.

发明申请
LAYOUT AWARE MULTI-MODAL NETWORKS FOR DOCUMENT UNDERSTANDING 有权

公开(公告)号：US20240420496A1

公开(公告)日：2024-12-19

申请号：US18210498

申请日：2023-06-15

Applicant: Oracle International Corporation

Inventor： Zheng Wang , Tao Sheng , Yazhe Hu , Mengqing Guo , Liyu Gong , Jun Qian , Katharine D'Orazio

IPC: G06V30/413 , G06V30/19 , G06V30/412 , G06V30/416

Abstract: Techniques for layout-aware multi-modal networks for document understanding are provided. In one technique, word data representations that were generated based on words that were extracted from an image of a document are identified. Based on the image, table features of one or more tables in the document are determined. One or more table data representations that were generated based on the table features are identified. The word data representations and the one or more table data representations are input into a machine-learned model to generate a document data representation for the document. A task is performed based on the document data representation. In a related technique, instead of the one or more table data representations, one or more layout data representations that were generated based on a set of layout features, of the document, that was determined based on the image are identified and input into the machine-learned model.

10.

发明公开
GENERATING SYNTHETIC TRAINING DATA INCLUDING DOCUMENT IMAGES WITH KEY-VALUE PAIRS 审中-公开

公开(公告)号：US20240177511A1

公开(公告)日：2024-05-30

申请号：US18058982

申请日：2022-11-28

Applicant: Oracle International Corporation

Inventor： Yazhe Hu , Tao Sheng , Jun Qian

IPC: G06V30/19 , G06V30/148 , G06V30/41

CPC classification number: G06V30/19147 , G06V30/153 , G06V30/41

Abstract: Automated techniques are for generating a large volume of diverse training data that can be used for training machine learning models to extract KV pairs from document images. Given a single input document image and associated annotation data, a large number of diverse synthetic training datapoints are automatically generated by a synthetic data generation system, each datapoint including a synthetic document image and associated annotation data. The generated synthetic training datapoints can be used to train and improve the performance of ML models for extracting KV pairs from document images. In certain implementations, multiple synthetic datapoints are generated by varying the values associated with a key for a content item within the input document image.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification