-
公开(公告)号:US20230196813A1
公开(公告)日:2023-06-22
申请号:US17552542
申请日:2021-12-16
发明人: Loganathan Muthu , Rahul Kotnala , Srinivasan Krishnan Rajagopalan , Peter Ashly Gopalan , Manikandan Chandran , Anand Yesuraj Prakash , Simantini Deb , Vijay Dhandapani , Harbhajan Singh , RBSanthosh Kumar , Lokesh Venkatappa , Ramakrishnan Raman
IPC分类号: G06V30/414 , G06V30/19 , G06V30/413 , G06V30/416
CPC分类号: G06V30/414 , G06V30/19107 , G06V30/413 , G06V30/19147 , G06V30/416
摘要: A system and method for automating and improving data extraction from a variety of document types, including both unstructured, structured, and nested content, is disclosed. The system and method incorporate an intelligent machine learning model that is designed to intelligently identify chunks of text, map the fields in the document, and extract multi-record values. The system is designed to operate with little to no human intervention, while offering significant gains in accuracy, data visualization, and efficiency. The architecture applies customized techniques including density-based adaptive text clustering, tabular data extraction based on hierarchical intelligent keyword searches, and natural language processing-based field value selection.
-
公开(公告)号:US11972627B2
公开(公告)日:2024-04-30
申请号:US17552542
申请日:2021-12-16
发明人: Loganathan Muthu , Rahul Kotnala , Srinivasan Krishnan Rajagopalan , Peter Ashly Gopalan , Manikandan Chandran , Anand Yesuraj Prakash , Simantini Deb , Vijay Dhandapani , Harbhajan Singh , RBSanthosh Kumar , Lokesh Venkatappa , Ramakrishnan Raman
IPC分类号: G06V30/414 , G06V30/19 , G06V30/412 , G06V30/413 , G06V30/416
CPC分类号: G06V30/414 , G06V30/19107 , G06V30/19147 , G06V30/412 , G06V30/413 , G06V30/416
摘要: A system and method for automating and improving data extraction from a variety of document types, including both unstructured, structured, and nested content, is disclosed. The system and method incorporate an intelligent machine learning model that is designed to intelligently identify chunks of text, map the fields in the document, and extract multi-record values. The system is designed to operate with little to no human intervention, while offering significant gains in accuracy, data visualization, and efficiency. The architecture applies customized techniques including density-based adaptive text clustering, tabular data extraction based on hierarchical intelligent keyword searches, and natural language processing-based field value selection.
-
公开(公告)号:US12001951B2
公开(公告)日:2024-06-04
申请号:US17210153
申请日:2021-03-23
发明人: Kavita V V Ganeshan , Swati Tata , Soujanya Soni , Madhur Bhasini Chaini , Anjani Kumari , Omar Razi , Thyagarajan Delli , Ullas Balan Nambiar , Guanglei Xiong , Sivasubramanian Arumugam Jalajam , Srinivasan Krishnan Rajagopalan , Venkatesan Kamalakannan , Harbhajan Singh
IPC分类号: G06N3/08 , G06F16/35 , G06F18/22 , G06F40/247 , G06N3/04 , G06V30/18 , G06V30/262 , G06V30/40
CPC分类号: G06N3/08 , G06F16/353 , G06F18/22 , G06F40/247 , G06N3/04 , G06V30/18057 , G06V30/262 , G06V30/40
摘要: A system for providing automated and domain specific contextual processing for context based verification may classify a plurality of extracted parameters from a set of digitized training document to assign a document similarity score with respect to a set of reference documents. The system may automatically detect a domain for the set of digitized training documents based on the document similarity score. The system may load a domain based neural model for the detected domain to generate a plurality of pre-defined contextual parameters. The system may receive a set of input documents and perform a contextual processing of the received set of documents based on the pre-defined contextual parameters to obtain an output in form of a plurality of filtered snippets, each bearing a corresponding rank. The context based verification may be performed based on the plurality of filtered snippets and the corresponding rank.
-
-