METHOD AND SYSTEM FOR RELEVANT DATA EXTRACTION FROM A DOCUMENT

发明公开

US20240355136A1 METHOD AND SYSTEM FOR RELEVANT DATA EXTRACTION FROM A DOCUMENT 审中-公开

请登陆查看更多内容

专利标题： METHOD AND SYSTEM FOR RELEVANT DATA EXTRACTION FROM A DOCUMENT
申请号： US18239778

申请日： 2023-08-30
公开(公告)号： US20240355136A1

公开(公告)日： 2024-10-24
发明人: NIRMAL RAMESH RAYULU VANAPALLI VENKATA , MADHUSUDAN SINGH , TAMILARASAN ELLAPPAN
申请人： L&T TECHNOLOGY SERVICES LIMITED
申请人地址： US TN Chennai
专利权人： L&T TECHNOLOGY SERVICES LIMITED
当前专利权人： L&T TECHNOLOGY SERVICES LIMITED
当前专利权人地址： US TN Chennai
优先权： IN 2341028817 2023.04.20
主分类号： G06V30/414
IPC分类号： G06V30/414 ; G06F40/169 ; G06F40/186 ; G06V10/94 ; G06V20/62 ; G06V30/19

摘要：

A method and system for relevant data extraction from a document is disclosed. The method includes determining first positional information corresponding to a key from a plurality of predefined keys in the document image based on a deep learning model. Further, second positional information corresponding to the key is determined based on OCR of the document image and an NLP model. Final positional information is determined based on the first positional information and the second positional information, in case a difference between the first positional information and the second positional information is minimal. Relevant data is extracted for the key in the OCR document image based on the final positional information.

信息查询

Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06V	图像或视频识别或理解
G06V30/00	字符识别；数字墨迹识别；面向文档的基于图像的模式识别（文档等的扫描、传输或复制 H04N1/00）
G06V30/40	.面向文档的基于图像的模式识别
G06V30/41	..文件内容分析（基于代码标记的印刷字符识别G06V30/224）
G06V30/414	...提取几何结构，例如布局树；块分割，例如图形或文本的边界框