Abstract:
A system that selects image frames from a video feed for recognition of objects (such as physical objects, text characters, or the like) within the image frames. The individual frames are selected using robust historical metrics that compare individual metrics of the particular image (such as focus, motion, intensity, etc.) to similar metrics of previous image frames in the video feed. The system will select the image frame for object recognition if the image frame is relatively high quality, that is the image frame is suitable for a later object recognition processing.
Abstract:
A system to select video frames for optical character recognition (OCR) based on feature metrics associated with blur and sharpness. A device captures a video frame including text characters. An edge detection filter is applied to the frame to determine gradient features in perpendicular directions. An “edge map” is created from the gradient features, and points along edges in the edge map are identified. Edge transition widths are determined at each of the edge points based in local intensity minimum and maximum on opposite sides of the respective edge point in the frame. Sharper edges have smaller edge transition widths than blurry images. Statistics are determined from the edge transition widths, and the statistics are processed by a trained classifier to determine if the frame is or is not sufficiently sharp for text processing.
Abstract:
A process for training and optimizing a system to select video frames for optical character recognition (OCR) based on feature metrics associated with blur and sharpness. A set of image frames are subjectively labelled based on a comparison of each frame before and after binarization to determine to what degree text is recognizable in the binary image. A plurality of different sharpness feature metrics are generated based on the original frame. A classifier is then trained using the feature metrics and the subjective labels. The feature metrics are then tested for accuracy and/or correlation with subjective labelling data. The set of feature metrics may be refined based on which metrics produce the best results.
Abstract:
A multi-orientation text detection method and associated system is disclosed that utilizes orientation-variant glyph features to determine a text line in an image regardless of an orientation of the text line. Glyph features are determined for each glyph in an image with respect to a neighboring glyph. The glyph features are provided to a learned classifier that outputs a glyph pair score for each neighboring glyph pair. Each glyph pair score indicates a likelihood that the corresponding pair of neighboring glyphs form part of a same text line. The glyph pair scores are used to identify candidate text lines, which are then ranked to select a final set of text lines in the image.