ENSURING THAT LANGUAGE MODELS FOLLOW INSTRUCTIONS INDICATED IN PROMPTS

    公开(公告)号:US20250094865A1

    公开(公告)日:2025-03-20

    申请号:US18629917

    申请日:2024-04-08

    Abstract: Techniques for ensuring that language models follow instructions indicated in prompts are provided. In one technique, a first language model generates a response based on a prompt. A set of instructions in the prompt is identified. For each instruction in the set, a second language model determines whether the response indicates that the first language model followed the instruction. In another technique, for each prompt of a plurality of prompts: (1) a first language model generates a response based on the prompt; (2) multiple instructions are identified based on the prompt; (3) a second language model generates, based on the plurality of instructions, an output that indicates that the first language model followed each instruction; and (4) the prompt, the response, and the multiple instructions are stored in a training instance. The first language model is finetuned based on the training instances.

    LANGUAGE MODEL SUMMARIZATION USING SEMANTICAL CLUSTERING

    公开(公告)号:US20250094716A1

    公开(公告)日:2025-03-20

    申请号:US18657308

    申请日:2024-05-07

    Abstract: Techniques for language model (LM) summarization using semantical clustering are provided. In one technique, a plurality of concepts reflected in text data is identified. A plurality of concept clusters is generated based on similarity among the plurality of concepts. Thus, some concept clusters may include multiple concepts. For each concept cluster of the plurality of concept clusters, an LM generates a summary of the text corresponding to that concept cluster. A summary response of the text data is generated by aggregating the summary of each concept cluster of the plurality of concept clusters. In another technique, an LM generates a summary based on text data. A first set of concepts reflected in the summary is identified and a second set of concepts reflected in the text data is identified. A difference between the two sets may indicate that the summary is missing one or more concepts.

    APPLICATION PERFORMANCE MONITORING FOR MONOLITHIC APPLICATIONS AND DISTRIBUTED SYSTEMS

    公开(公告)号:US20240403197A1

    公开(公告)日:2024-12-05

    申请号:US18806468

    申请日:2024-08-15

    Abstract: A computing device may access a target code for implementing an application. The device may identify addresses for one or more functions or one or more variables associated with the target code. The device may generate an interval tree comprising a root node and one or more function nodes. The device may in response to the target code invoking a function or variable: generate an intercept function configured to intercept communication between the target code and a call address for the at least one of the one or more functions or the one or more variables invoked by the target code. The device may intercept data communicated between the target code and the call address. The device may store the intercepted data as a function node in the interval tree. The device may transmit the interval tree to a user device.

    TABLE EXTRACTION FROM IMAGE-BASED DOCUMENTS
    6.
    发明公开

    公开(公告)号:US20230260309A1

    公开(公告)日:2023-08-17

    申请号:US17835813

    申请日:2022-06-08

    CPC classification number: G06V30/414 G06V30/10 G06V30/19107

    Abstract: Techniques are described for extracting tables and associated content from image-based documents and generating a machine-readable representation of a table. A system is described that executes an end-to-end pipeline for extracting one or more tables from an image-based documents and generating a machine-readable and editable table representation based upon the extracted contents. The processing may include using OCR techniques to extract text portions from an image-based document, identifying a region (table region) in the image-based document containing a table, identifying a subset of text portions that are located inside the table region, determining a number of rows and columns in the table to be generated, aligning the text portions and assigning row and column indices to the text portions, and generating a machine-readable table representation based upon the text portions.

    TECHNIQUES FOR GRAPH DATA STRUCTURE AUGMENTATION

    公开(公告)号:US20230146501A1

    公开(公告)日:2023-05-11

    申请号:US17524157

    申请日:2021-11-11

    CPC classification number: G06K9/00442 G06N20/00 G06K2209/01

    Abstract: A computing device may receive a set of user documents. Data may be extracted from the documents to generate a first graph data structure with one or more initial graphs containing key-value pairs. A model may be trained on the first graph data structure to classify the pairs. Until a set of evaluation metrics for the model exceeds a set of deployment thresholds: generating, a set of evaluation metrics may be generated for the model. The set of evaluation metrics may be compared to the set of deployment thresholds. In response to a determination that the set of evaluation metrics are below the set of deployment thresholds: one or more new graphs may be generated from the one or more initial graphs in the first graph data structure to produce a second graph data structure. The first and second graph can be used to train the model.

    OFFICER-IN-THE-LOOP CRIME REPORT GENERATION USING LARGE LANGUAGE MODELS AND PROMPT ENGINEERING

    公开(公告)号:US20250095096A1

    公开(公告)日:2025-03-20

    申请号:US18885192

    申请日:2024-09-13

    Abstract: The present disclosure relates to utilizing large language models (LLMs) to facilitate generation of incident reports or similar documents. One or more initial inputs may be received from a user, and one or more example incident reports may be identified. The one or more example incident reports and the one or more initial inputs may be sent to an LLM. A reviewable version of an incident report may be accessed that is based on output that the LLM generated based on the example incident reports and the one or more initial inputs. The reviewable version of the incident report may be presented in a human readable format via a graphical user interface (GUI). A modification corresponding to the reviewable version of the incident report may be received via the GUI. The modification and the reviewable version of the incident report may be sent to the LLM to cause the LLM to generate an updated version of the incident report.

    LAYOUT AWARE MULTI-MODAL NETWORKS FOR DOCUMENT UNDERSTANDING

    公开(公告)号:US20240420496A1

    公开(公告)日:2024-12-19

    申请号:US18210498

    申请日:2023-06-15

    Abstract: Techniques for layout-aware multi-modal networks for document understanding are provided. In one technique, word data representations that were generated based on words that were extracted from an image of a document are identified. Based on the image, table features of one or more tables in the document are determined. One or more table data representations that were generated based on the table features are identified. The word data representations and the one or more table data representations are input into a machine-learned model to generate a document data representation for the document. A task is performed based on the document data representation. In a related technique, instead of the one or more table data representations, one or more layout data representations that were generated based on a set of layout features, of the document, that was determined based on the image are identified and input into the machine-learned model.

    GENERATING SYNTHETIC TRAINING DATA INCLUDING DOCUMENT IMAGES WITH KEY-VALUE PAIRS

    公开(公告)号:US20240177511A1

    公开(公告)日:2024-05-30

    申请号:US18058982

    申请日:2022-11-28

    CPC classification number: G06V30/19147 G06V30/153 G06V30/41

    Abstract: Automated techniques are for generating a large volume of diverse training data that can be used for training machine learning models to extract KV pairs from document images. Given a single input document image and associated annotation data, a large number of diverse synthetic training datapoints are automatically generated by a synthetic data generation system, each datapoint including a synthetic document image and associated annotation data. The generated synthetic training datapoints can be used to train and improve the performance of ML models for extracting KV pairs from document images. In certain implementations, multiple synthetic datapoints are generated by varying the values associated with a key for a content item within the input document image.

Patent Agency Ranking