INFORMATION PROCESSING APPARATUS AND IMAGE READING APPARATUS JUDGING TITLE OF READ DOCUMENT

    公开(公告)号:US20250029416A1

    公开(公告)日:2025-01-23

    申请号:US18770714

    申请日:2024-07-12

    Abstract: In an image reading apparatus, a character area extractor extracts character areas from a document image in units of rows. A title reliability calculator calculates a title reliability level of each character area using a feature quantity data set and a machine learning model. A character recognizer converts character areas of which title reliability levels exceed a threshold into text data. A title judger collates text data with a title candidate. In a case in which one piece of text data coinciding with a title candidate is judged and detected, the title judger sets the text data as a title of the document image. In a case in which a plurality of pieces of coinciding text data are judged and detected, the title judger sets text data of which a title reliability level is the highest among the detected pieces of text data as a title of the document image.

    IMAGE READING APPARATUS FOR DIVIDING READ DOCUMENT IMAGES INTO DOCUMENTS

    公开(公告)号:US20250029414A1

    公开(公告)日:2025-01-23

    申请号:US18776386

    申请日:2024-07-18

    Abstract: In an image reading apparatus, the image reading apparatus reads a document bundle to acquire a document image, and determines a first page of the document image according to the selected division method by a user. A page number recognizer extracts a page number from the document image by executing page number recognition processing, and determines the document image indicating a first page to be a first page of the document. A layout recognizer detects a marginal area or background color from the document image by executing layout recognition processing, and determines the first page of the document. A title recognizer extracts a title by executing title recognition processing and determines the first page of the document. A divider divides the document image into documents on the basis of the determined first page, converts the divided document images into files, and stores the files in a storage device.

Patent Agency Ranking