Document classification of files on the client side before upload

发明授权

US11948383B2 Document classification of files on the client side before upload 有权

请登陆查看更多内容

专利标题： Document classification of files on the client side before upload
申请号： US17223922

申请日： 2021-04-06
公开(公告)号： US11948383B2

公开(公告)日： 2024-04-02
发明人: William J. Farmer, II , Sreenidhi Narayanamangalathu Kesavan , Dimitri Bilenkin , William Clayton Jackson , Karthikeyan Palanivelu , Siddharth Mangalik
申请人： Capital One Services, LLC
申请人地址： US VA McLean
专利权人： Capital One Services, LLC
当前专利权人： Capital One Services, LLC
当前专利权人地址： US VA McLean
代理机构： Sterne, Kessler, Goldstein & Fox P.L.L.C.
主分类号： G06V30/413
IPC分类号： G06V30/413 ; G06N20/00

Document classification of files on the client side before upload

摘要：

A method for classifying a document in real-time is disclosed. The method includes identifying one or more sections of the document likely to contain text based on a contrast between dark space and light space in an image of the document. Optical character recognition is performed within the identified sections of the document to identify a set of words within each identified section of the document. The sets of words are extracted from the identified sections of the document, and a subset of the sets of words is selected for classifying the document based on a preconfigured option. The document is then classified by inputting the selected subset of words into one or more machine learning models. The method includes transmitting the document and the determined classification of the document to an external server.

公开/授权文献

US20220318547A1 DOCUMENT CLASSIFICATION OF FILES ON THE CLIENT SIDE BEFORE UPLOAD 公开/授权日：2022-10-06

信息查询

Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06V	图像或视频识别或理解
G06V30/00	字符识别；数字墨迹识别；面向文档的基于图像的模式识别（文档等的扫描、传输或复制 H04N1/00）
G06V30/40	.面向文档的基于图像的模式识别
G06V30/41	..文件内容分析（基于代码标记的印刷字符识别G06V30/224）
G06V30/413	...内容分类，例如文字、照片或表格