- 专利标题: Extracting data from documents using multiple deep learning models
-
申请号: US17074954申请日: 2020-10-20
-
公开(公告)号: US11630956B2公开(公告)日: 2023-04-18
- 发明人: Karan Yaramada , Akhil Sahai , Adesh Patel
- 申请人: Jade Global, Inc.
- 申请人地址: US CA San Jose
- 专利权人: Jade Global, Inc.
- 当前专利权人: Jade Global, Inc.
- 当前专利权人地址: US CA San Jose
- 代理机构: Fountainhead Law Group PC
- 主分类号: G06F40/295
- IPC分类号: G06F40/295 ; G06F40/205 ; G06N20/00
摘要:
Techniques for automatically extracting data from documents using multiple deep learning models are provided. According to one set of embodiments, a computer system can receive a document in an electronic format and can segment, using an image segmentation deep learning model, the document into a plurality of segments, where each segment corresponds to a visually discrete portion of the document and is classified as being one of a plurality of types. The computer system can then, for each segment in the plurality of segments, retrieve text in the segment using optical character recognition (OCR) and extract data in the segment from the retrieved text using a named entity recognition (NER) deep learning model, where the retrieving and the extracting are performed in a manner that takes into account the segment's type.
公开/授权文献
信息查询