VISION-BASED GENERATION OF NAVIGATION WORKFLOW FOR AUTOMATICALLY FILLING APPLICATION FORMS USING LARGE LANGUAGE MODELS

    公开(公告)号:US20250131185A1

    公开(公告)日:2025-04-24

    申请号:US18883765

    申请日:2024-09-12

    Abstract: Robotic Process Automation (RPA) systems face challenges in handling complex processes and diverse screen layouts that require advanced human-like decision-making capabilities. These systems typically rely on pixel-level encoding through drag-and-drop or automation frameworks such as Selenium to create navigation workflows, rather than visual understanding of screen elements. Present disclosure provides systems and methods that implement large language models (LLMs) coupled with deep learning based image understanding which adapt to new scenarios, including changes in user interface and variations in input data, without the need for human intervention. System of the present disclosure uses computer vision and natural language processing to perceive visible elements on graphical user interface (GUI) and convert them into a textual representation. This information is then utilized by LLMs to generate one or more navigation workflows that include a sequence of actions that are executed by a scripting engine/code to complete an assigned task from a task-request.

    METHOD AND SYSTEM FOR TABLE STRUCTURE RECOGNITION VIA DEEP SPATIAL ASSOCIATION OF WORDS

    公开(公告)号:US20230055391A1

    公开(公告)日:2023-02-23

    申请号:US17807215

    申请日:2022-06-16

    Abstract: State of art techniques that utilize spatial association based Table structure Recognition (TSR) have limitation in selecting minimal but most informative word pairs to generate digital table representation. Embodiments herein provide a method and system for TSR from an table image via deep spatial association of words using optimal number of word pairs, analyzed by a single classifier to determine word association. The optimal number of word pairs are identified by utilizing immediate left neighbors and immediate top neighbors approach followed redundant word pair elimination, thus enabling accurate capture of structural feature of even complex table images via minimal word pairs. The reduced number of word pairs in combination with the single classifier trained to determine the word associations into classes comprising as same cell, same row, same column and unrelated, provides TSR pipeline with reduced computational complexity, consuming less resources still generating more accurate digital representation of complex tables.

    INTELLIGENT VISUAL REASONING OVER GRAPHICAL ILLUSTRATIONS USING A MAC UNIT

    公开(公告)号:US20220222956A1

    公开(公告)日:2022-07-14

    申请号:US17594578

    申请日:2020-05-28

    Abstract: This disclosure relates generally to intelligent visual reasoning over graphical illustrations using a MAC unit. Prior arts use visual attention to map particular words in a question to specific areas in an image to memorize the corresponding answers, thereby resulting in a limited capability to answer questions of a specific type. The present disclosure incorporates the MAC unit to enable reasoning capabilities and accordingly attend to an area in the image to find the answer. The present disclosure therefore allows generalizing over a possible set of questions with varying complexities so that an unseen question can also be answered correctly based on the reasoning methods that it has learned. The system and method of the present disclosure can be used for understanding of visual information when processing documents like business reports, research papers, consensus reports etc. containing charts and reduce the time spent in manual analysis.

Patent Agency Ranking