-
公开(公告)号:US20220198185A1
公开(公告)日:2022-06-23
申请号:US17127174
申请日:2020-12-18
Inventor: Tim Prebble
Abstract: An image processing method includes: generating, from combined connected components (CCs) of a document image, candidate text CCs, candidate background CCs, and candidate natural image CCs where the candidate background CCs are excluded from the combined CCs to generate the candidate natural image CCs with a predetermined criterion dependent on the candidate text CCs; generating a final natural image bounding box by expanding a candidate natural image bounding box of the candidate natural image CCs and including in the expanded candidate natural image bounding box at least one combined CC that intersects the expanded candidate natural image bounding box; and modifying, based on the final natural image bounding box, the document image and displaying the modified document image to a user.
-
公开(公告)号:US20230094651A1
公开(公告)日:2023-03-30
申请号:US17490770
申请日:2021-09-30
Inventor: Tim Prebble
Abstract: A method for extracting text from an input image and generating a document includes: generating an edges mask from the input image; generating an edges image that is derived from the edges mask; identifying, within the edges mask, one or more probable text areas; extracting a first set of text characters by performing a first optical character recognition (OCR) operation on each of one or more probable text portions, of the derived edges image, corresponding to each of the probable text areas; generating a modified image by erasing, from the input image, image characters corresponding to the first set of text characters extracted by the first OCR operation; and generating a document by overlaying the extracted first set of text characters on the modified image.
-
公开(公告)号:US11100355B1
公开(公告)日:2021-08-24
申请号:US16830401
申请日:2020-03-26
Inventor: Tim Prebble
Abstract: A method, non-transitory computer readable medium, and system to reduce visual background noise in an image, especially in images of document pages, without destroying and/or deteriorating the content of that image. In particular, natural images, filled and stroked vector graphics, and text are protected from being destroyed and/or deteriorated by the noise removal process.
-
公开(公告)号:US12293143B2
公开(公告)日:2025-05-06
申请号:US17957267
申请日:2022-09-30
Inventor: Tim Prebble
IPC: G06V30/414 , G06F40/103 , G06F40/106 , G06F40/117
Abstract: Systems, methods, apparatuses, and computer program products for detecting and tagging of paragraphs that span columns, pages, or other reading units are provided. For example, a method can include obtaining a set of candidate paragraphs for a document. The method can also include identifying a pair of candidate paragraphs spanning columns from among the set of candidate paragraphs. The method can further include outputting a tagged paragraph corresponding to the pair of candidate paragraphs spanning columns.
-
公开(公告)号:US12062246B2
公开(公告)日:2024-08-13
申请号:US17490770
申请日:2021-09-30
Inventor: Tim Prebble
IPC: G06V30/18 , G06F18/2431 , G06T7/11 , G06T7/13 , G06T11/00
CPC classification number: G06V30/18 , G06F18/2431 , G06T7/11 , G06T7/13 , G06T11/00
Abstract: A method for extracting text from an input image and generating a document includes: generating an edges mask from the input image; generating an edges image that is derived from the edges mask; identifying, within the edges mask, one or more probable text areas; extracting a first set of text characters by performing a first optical character recognition (OCR) operation on each of one or more probable text portions, of the derived edges image, corresponding to each of the probable text areas; generating a modified image by erasing, from the input image, image characters corresponding to the first set of text characters extracted by the first OCR operation; and generating a document by overlaying the extracted first set of text characters on the modified image.
-
公开(公告)号:US11721119B2
公开(公告)日:2023-08-08
申请号:US17127174
申请日:2020-12-18
Inventor: Tim Prebble
IPC: G06V30/413 , G06T11/20
CPC classification number: G06V30/413 , G06T11/20 , G06T2210/12
Abstract: An image processing method includes: generating, from combined connected components (CCs) of a document image, candidate text CCs, candidate background CCs, and candidate natural image CCs where the candidate background CCs are excluded from the combined CCs to generate the candidate natural image CCs with a predetermined criterion dependent on the candidate text CCs; generating a final natural image bounding box by expanding a candidate natural image bounding box of the candidate natural image CCs and including in the expanded candidate natural image bounding box at least one combined CC that intersects the expanded candidate natural image bounding box; and modifying, based on the final natural image bounding box, the document image and displaying the modified document image to a user.
-
公开(公告)号:US11069043B1
公开(公告)日:2021-07-20
申请号:US16818089
申请日:2020-03-13
Inventor: Tim Prebble
Abstract: A method to reduce background noise in a document image. The method includes extracting, from the document image, a connected component corresponding to a background of the document image, generating a histogram of pixel values of the connected component, generating, using a non-linear mapping function based on the histogram, a non-linear probability distribution of the pixel values in the connected component, generating, based at least on a comparison between the non-linear probability distribution and a predetermined threshold, a replacement range of the pixel values, selecting, from the connected component, a pixel having a pixel value within the replacement range, and converting the pixel value of the pixel to a uniform background color.
-
公开(公告)号:US11330149B1
公开(公告)日:2022-05-10
申请号:US17150070
申请日:2021-01-15
Inventor: Tim Prebble
Abstract: A method to reduce background noise in a document image includes: extracting, from the document image, a connected component corresponding to a background of the document image; generating a histogram of pixel values of the connected component; generating a replacement range using a range pruning algorithm that narrows a range of the histogram by iteratively discarding at least one pixel value and corresponding pixel count of the histogram from at least one side of the histogram; selecting, from the connected component, at least one pixel having a corresponding pixel value within the replacement range; converting the corresponding pixel value of the at least one pixel to a uniform background color; and outputting, subsequent to the converting, the document image.
-
-
-
-
-
-
-