Document classification of files on the client side before upload
摘要:
A method for classifying a document in real-time is disclosed. The method includes identifying one or more sections of the document likely to contain text based on a contrast between dark space and light space in an image of the document. Optical character recognition is performed within the identified sections of the document to identify a set of words within each identified section of the document. The sets of words are extracted from the identified sections of the document, and a subset of the sets of words is selected for classifying the document based on a preconfigured option. The document is then classified by inputting the selected subset of words into one or more machine learning models. The method includes transmitting the document and the determined classification of the document to an external server.
信息查询
0/0