Abstract:
Various embodiments enable regions of text to be identified in an image captured by a camera of a computing device for preprocessing before being analyzed by a visual recognition engine. For example, each of the identified regions can be analyzed or tested to determine whether a respective region contains a quality associated with poor text recognition results, such as poor contrast, blur, noise, and the like, which can be measured by one or more algorithms. Upon identifying a region with such a quality, an image quality enhancement can be automatically applied to the respective region without user instruction or intervention. Accordingly, once each region has been cleared of the quality associated with poor recognition, the regions of text can be processed with a visual recognition algorithm or engine.
Abstract:
A system capable of performing natural language understanding (NLU) on utterances including complex command structures such as sequential commands (e.g., multiple commands in a single utterance), conditional commands (e.g., commands that are only executed if a condition is satisfied), and/or repetitive commands (e.g., commands that are executed until a condition is satisfied). Audio data may be processed using automatic speech recognition (ASR) techniques to obtain text. The text may then be processed using machine learning models that are trained to parse text of incoming utterances. The models may identify complex utterance structures and may identify what command portions of an utterance go with what conditional statements. Machine learning models may also identify what data is needed to determine when the conditionals are true so the system may cause the commands to be executed (and stopped) at the appropriate times.
Abstract:
Disclosed are techniques for recognizing text from one or more frames of image data using contextual information. In some implementations, image data including a captured textual item is processed to identify an entity in the image data. A context can be selected using the entity, where the context corresponds to a dictionary. Text in the captured textual item can be identified using the dictionary. The identified text can be output to a display device.
Abstract:
Various embodiments crowd source images to cover various angles, zoom levels, and elevations of objects and/or points of interest (POIs) while under various lighting conditions. The crowd sourced images are tagged or associated with a particular POI or geographic location and stored in a database for use by an augmented reality (AR) application to recognize objects appearing in a live view of a scene captured by at least one camera of a computing device. The more comprehensive the database, the more accurately an object or POI in the scene will be recognized and/or tracked by the AR application. Accordingly, the more accurately an object is recognized and tracked by the AR application, the more smoothly and continuous the content and movement transitions thereof can be presented to users in the live view.
Abstract:
Various embodiments enable a computing device to incorporate frame selection or preprocessing techniques into a text recognition pipeline in an attempt to improve text recognition accuracy in various environments and situations. For example, a mobile computing device can capture images of text using a first camera, such as a rear-facing camera, while capturing images of the environment or a user with a second camera, such as a front-facing camera. Based on the images captured of the environment or user, one or more image preprocessing parameters can be determined and applied to the captured images in an attempt to improve text recognition accuracy.
Abstract:
The recognition of text in an acquired image is improved by using general and type-specific heuristics that can determine the likelihood that a portion of the text is truncated at an edge of an image, frame, or screen. Truncated text can be filtered such that the user is not provided with an option to perform an undesirable task, such as to dial an incorrect number or connect to an incorrect Web address, based on recognizing an incomplete text string. The general and type-specific heuristics can be combined to improve confidence, and the image data can be pre-processed on the device before processing with an optical character recognition (OCR) engine. Multiple frames can be analyzed to attempt to recognize words or characters that might have been truncated in one or more of the frames.