Patent search ap:("AMAZON TECHNOLOGIES Page INC.") AND inv:"Avnish Sikka"

11.

发明申请
LOCAL IMAGE ENHANCEMENT FOR TEXT RECOGNITION 有权
Title translation: 本地图像增强文字识别

公开(公告)号：US20140270528A1

公开(公告)日：2014-09-18

申请号：US13800951

申请日：2013-03-13

Applicant: Amazon Technologies, Inc.

Inventor： DAVID PAUL RAMOS , Chang Yuan , Keith Harrison Goodman , Avnish Sikka

IPC: G06T5/00

CPC classification number: G06T5/001 , G06K9/00228 , G06K9/03 , G06K9/3258 , G06K9/34 , G06K9/40 , G06K9/44 , G06K2209/01 , G06T7/73 , G06T2207/10004 , G06T2207/30168 , G06T2207/30201

Abstract: Various embodiments enable regions of text to be identified in an image captured by a camera of a computing device for preprocessing before being analyzed by a visual recognition engine. For example, each of the identified regions can be analyzed or tested to determine whether a respective region contains a quality associated with poor text recognition results, such as poor contrast, blur, noise, and the like, which can be measured by one or more algorithms. Upon identifying a region with such a quality, an image quality enhancement can be automatically applied to the respective region without user instruction or intervention. Accordingly, once each region has been cleared of the quality associated with poor recognition, the regions of text can be processed with a visual recognition algorithm or engine.

Abstract translation: 各种实施例使得在由视觉识别引擎分析之前，在由计算设备的照相机拍摄的图像中识别文本区域以进行预处理。例如，可以分析或测试每个所识别的区域以确定相应区域是否包含与差的文本识别结果相关联的质量，例如差的对比度，模糊，噪声等，其可以由一个或多个算法。在识别具有这种质量的区域时，可以在没有用户指导或干预的情况下自动地将图像质量增强应用于相应区域。因此，一旦每个区域已被清除与识别不良相关的质量，文本区域可以用视觉识别算法或引擎进行处理。

12.

发明授权
Processing complex utterances for natural language understanding 有权

公开(公告)号：US11410646B1

公开(公告)日：2022-08-09

申请号：US16368399

申请日：2019-03-28

Applicant: Amazon Technologies, Inc.

Inventor： Cengiz Erbas , Thomas Kollar , Avnish Sikka , Spyridon Matsoukas , Simon Peter Reavely

IPC: G10L15/22 , G10L15/06 , G10L15/02 , G10L15/18

Abstract: A system capable of performing natural language understanding (NLU) on utterances including complex command structures such as sequential commands (e.g., multiple commands in a single utterance), conditional commands (e.g., commands that are only executed if a condition is satisfied), and/or repetitive commands (e.g., commands that are executed until a condition is satisfied). Audio data may be processed using automatic speech recognition (ASR) techniques to obtain text. The text may then be processed using machine learning models that are trained to parse text of incoming utterances. The models may identify complex utterance structures and may identify what command portions of an utterance go with what conditional statements. Machine learning models may also identify what data is needed to determine when the conditionals are true so the system may cause the commands to be executed (and stopped) at the appropriate times.

13.

发明授权
Recognizing text from frames of image data using contextual information 有权
Title translation: 使用上下文信息识别来自图像数据帧的文本

公开(公告)号：US09355336B1

公开(公告)日：2016-05-31

申请号：US14259905

申请日：2014-04-23

Applicant: Amazon Technologies, Inc.

Inventor： Sonjeev Jahagirdar , Matthew Joseph Cole , David Paul Ramos , Utkarsh Prateek , Emilie Noelle McConville , Ankur Datta , Laura Varnum Finney , Yue Liu , Bhavesh Anil Doshi , Avnish Sikka , Michael Vanne

IPC: G06K9/00 , G06K9/62

CPC classification number: G06K9/6217 , G06K9/00979 , G06K9/723 , G06K2209/01

Abstract: Disclosed are techniques for recognizing text from one or more frames of image data using contextual information. In some implementations, image data including a captured textual item is processed to identify an entity in the image data. A context can be selected using the entity, where the context corresponds to a dictionary. Text in the captured textual item can be identified using the dictionary. The identified text can be output to a display device.

Abstract translation: 公开了使用上下文信息从一个或多个图像数据帧识别文本的技术。在一些实现中，处理包括捕获的文本项的图像数据以识别图像数据中的实体。可以使用实体选择上下文，其中上下文对应于字典。捕获的文本项目中的文本可以使用字典来识别。识别的文本可以输出到显示设备。

14.

发明授权
Graphical refinement for points of interest 有权
Title translation: 兴趣点的图形细化

公开(公告)号：US09269011B1

公开(公告)日：2016-02-23

申请号：US13764646

申请日：2013-02-11

Applicant: Amazon Technologies, Inc.

Inventor： Avnish Sikka , James Sassano , Sonjeev Jahagirdar , Pengcheng Wu , Nicholas Randal Sovich

IPC: G06K9/32

CPC classification number: G06K9/3233 , G06K9/00671 , G06K9/2081 , G09G5/377 , G09G2340/12

Abstract: Various embodiments crowd source images to cover various angles, zoom levels, and elevations of objects and/or points of interest (POIs) while under various lighting conditions. The crowd sourced images are tagged or associated with a particular POI or geographic location and stored in a database for use by an augmented reality (AR) application to recognize objects appearing in a live view of a scene captured by at least one camera of a computing device. The more comprehensive the database, the more accurately an object or POI in the scene will be recognized and/or tracked by the AR application. Accordingly, the more accurately an object is recognized and tracked by the AR application, the more smoothly and continuous the content and movement transitions thereof can be presented to users in the live view.

Abstract translation: 在各种照明条件下，各种实施例使源图像覆盖对象和/或兴趣点（POI）的各种角度，缩放级别和高程。所拍摄的图像被标记或与特定POI或地理位置相关联并存储在数据库中以供增强现实（AR）应用程序使用，以识别出现在计算机的至少一个照相机拍摄的场景的实时视图中的对象设备。数据库越全面，AR应用程序将识别和/或跟踪场景中的对象或POI越准确。因此，由AR应用程序识别和跟踪对象的准确度越高，在实时视图中，用户可以将其内容和移动转换更为平滑和连续地呈现给用户。

15.

发明授权
Using a front-facing camera to improve OCR with a rear-facing camera 有权
Title translation: 使用前置摄像头，使用后置摄像头改善OCR

公开(公告)号：US09269009B1

公开(公告)日：2016-02-23

申请号：US14283115

申请日：2014-05-20

Applicant: Amazon Technologies, Inc.

Inventor： Yue Liu , Sonjeev Jahagirdar , Matthew Joseph Cole , Utkarsh Prateek , Emilie Noelle McConville , Daniel Makoto Wilenson , Avnish Sikka

IPC: G06K9/18 , G06K9/00

CPC classification number: G06K9/18 , G06K9/00302 , G06K9/00664 , G06K9/033 , G06K2209/01

Abstract: Various embodiments enable a computing device to incorporate frame selection or preprocessing techniques into a text recognition pipeline in an attempt to improve text recognition accuracy in various environments and situations. For example, a mobile computing device can capture images of text using a first camera, such as a rear-facing camera, while capturing images of the environment or a user with a second camera, such as a front-facing camera. Based on the images captured of the environment or user, one or more image preprocessing parameters can be determined and applied to the captured images in an attempt to improve text recognition accuracy.

Abstract translation: 各种实施例使得计算设备能够将帧选择或预处理技术合并到文本识别流水线中，以试图改善各种环境和情况下的文本识别精度。例如，移动计算设备可以使用诸如后置摄像机之类的第一照相机捕获文本的图像，同时利用诸如前置摄像机的第二照相机拍摄环境图像或用户。基于捕获的环境或用户的图像，可以确定一个或多个图像预处理参数并将其应用于捕获的图像，以提高文本识别精度。

16.

发明授权
Text recognition near an edge 有权
Title translation: 靠近边缘的文本识别

公开(公告)号：US09239961B1

公开(公告)日：2016-01-19

申请号：US14495589

申请日：2014-09-24

Applicant: Amazon Technologies, Inc.

Inventor： Matthew Joseph Cole , Yue Liu , David Paul Ramos , Avnish Sikka

IPC: G06K9/00 , G06K9/18 , G06K9/32

CPC classification number: G06K9/00456 , G06K9/2081 , G06K9/325

Abstract: The recognition of text in an acquired image is improved by using general and type-specific heuristics that can determine the likelihood that a portion of the text is truncated at an edge of an image, frame, or screen. Truncated text can be filtered such that the user is not provided with an option to perform an undesirable task, such as to dial an incorrect number or connect to an incorrect Web address, based on recognizing an incomplete text string. The general and type-specific heuristics can be combined to improve confidence, and the image data can be pre-processed on the device before processing with an optical character recognition (OCR) engine. Multiple frames can be analyzed to attempt to recognize words or characters that might have been truncated in one or more of the frames.

Abstract translation: 通过使用可以确定文本的一部分在图像，帧或屏幕的边缘被截断的可能性的一般和类型特定的启发式算法来改进获取的图像中的文本的识别。截断的文本可以被过滤，以便基于识别不完整的文本字符串，用户未被提供执行不期望的任务的选项，例如拨打不正确的号码或连接到不正确的Web地址。一般和类型特定的启发式可以组合以提高置信度，并且可以在使用光学字符识别（OCR）引擎处理之前在设备上预处理图像数据。可以分析多个帧以尝试识别可能在一个或多个帧中被截断的字或字符。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification