-
公开(公告)号:US10043069B1
公开(公告)日:2018-08-07
申请号:US14196669
申请日:2014-03-04
Applicant: Amazon Technologies, Inc.
Inventor: Yue Liu , Utkarsh Prateek , Avnish Sikka , Matthew Daniel Hart , Emilie Noelle McConville , Sonjeev Jahagirdar
IPC: G06K9/00
Abstract: A system for recognizing objects and/or text in image data may use context data to perform object/text recognition. The system may also use context data when determining potential functions to execute in response to recognizing the object/text. Context data may be gathered based on device sensor data, user profile data such as the behavior of a user or the behavior of those in a user's social network, or other factors. Recognition processing and/or function selection may be configured to account for context data when operating to improve output results.
-
公开(公告)号:US20230032575A1
公开(公告)日:2023-02-02
申请号:US17882874
申请日:2022-08-08
Applicant: Amazon Technologies, Inc.
Inventor: Cengiz Erbas , Thomas Kollar , Avnish Sikka , Spyridon Matsoukas , Simon Peter Reavely
Abstract: A system capable of performing natural language understanding (NLU) on utterances including complex command structures such as sequential commands (e.g., multiple commands in a single utterance), conditional commands (e.g., commands that are only executed if a condition is satisfied), and/or repetitive commands (e.g., commands that are executed until a condition is satisfied). Audio data may be processed using automatic speech recognition (ASR) techniques to obtain text. The text may then be processed using machine learning models that are trained to parse text of incoming utterances. The models may identify complex utterance structures and may identify what command portions of an utterance go with what conditional statements. Machine learning models may also identify what data is needed to determine when the conditionals are true so the system may cause the commands to be executed (and stopped) at the appropriate times.
-
公开(公告)号:US09826156B1
公开(公告)日:2017-11-21
申请号:US14741248
申请日:2015-06-16
Applicant: AMAZON TECHNOLOGIES, INC.
Inventor: Yue Liu , Avnish Sikka
CPC classification number: H04N5/23251 , G01B11/14 , G03B13/36 , H04N5/2258 , H04N5/23212 , H04N5/23238
Abstract: A system and method of determining a tilt angle of a portable computing device using sensor data; identifying a tilt angle from a plurality of predetermined tilt angle ranges; determining focal length settings for image capture devices of the portable computing device using the tilt angle, adjustment increments, and autofocus scan range algorithms. A portable computing device including a processor; a first image capture device on a first side of the portable computing device, and a second image capture device on the second side of the portable computing device, the second side located opposite of the first side; and a memory device including instructions operable to be executed by the processor to perform a set of actions, enabling the portable computing device to perform the method.
-
公开(公告)号:US09418283B1
公开(公告)日:2016-08-16
申请号:US14463961
申请日:2014-08-20
Applicant: Amazon Technologies, Inc.
Inventor: Pradeep Natarajan , Avnish Sikka , Rohit Prasad
CPC classification number: G06K9/00463 , G06K9/3258 , G06K9/42 , G06K9/64 , G06K9/6857 , G06T3/40
Abstract: A system to recognize text, objects, or symbols in a captured image using machine learning models reduces computational overhead by generating a plurality of thumbnail versions of the image at different downscaled resolutions and aspect ratios, and then processing the downscaled images instead of the entire image, or sections of the entire image. The downscaled images are processed to produce a combine feature vector characterizing the overall image. The combined feature vector is processed using the machine learning model.
Abstract translation: 使用机器学习模型识别拍摄图像中的文本,对象或符号的系统通过以不同的缩小分辨率和高宽比生成图像的多个缩略图版本,然后处理缩小的图像而不是整个图像来减少计算开销 ,或整个图像的部分。 处理缩小的图像以产生表征整体图像的组合特征向量。 使用机器学习模型处理组合特征向量。
-
公开(公告)号:US09659224B1
公开(公告)日:2017-05-23
申请号:US14230471
申请日:2014-03-31
Applicant: Amazon Technologies, Inc.
Inventor: Matthew Joseph Cole , Sonjeev Jahagirdar , Matthew Daniel Hart , David Paul Ramos , Ankur Datta , Utkarsh Prateek , Emilie Noelle McConville , Prashant Hegde , Avnish Sikka
CPC classification number: G06K9/18 , G06K9/00979 , G06K9/6292 , G06K9/72 , G06K2209/01 , G06K9/00449 , G06K9/00463 , G06K9/00442
Abstract: Disclosed are techniques for merging optical character recognized (OCR'd) text from frames of image data. In some implementations, a device sends frames of image data to a server, where each frame includes at least a portion of a captured textual item. The server performs optical character recognition (OCR) on the image data of each frame. When OCR'd text from respective frames is returned to the device from the server, the device can perform matching operations on the text, for instance, using bounding boxes and/or edit distance processing. The device can merge any identified matches of OCR'd text from different frames. The device can then display the merged text with any corrections.
-
公开(公告)号:US09224061B1
公开(公告)日:2015-12-29
申请号:US14464365
申请日:2014-08-20
Applicant: Amazon Technologies, Inc.
Inventor: Pradeep Natarajan , Avnish Sikka , Rohit Prasad
CPC classification number: G06K9/3208 , G06K9/3258 , G06K2209/01
Abstract: A system estimates text orientation in images captured using a handheld camera prior detecting text in the image. Text orientation is estimated based on edges detected within the image, and the image is rotated based on the estimated orientation. Text detection and processing is then performed on the rotated image. Non-text features along a periphery of the image may be sampled to assure that clutter will not undermine the estimation of orientation.
Abstract translation: 系统估计在检测图像中的文本之前使用手持相机拍摄的图像中的文本方向。 基于在图像内检测到的边缘估计文本取向,并且基于估计的方向旋转图像。 然后对旋转的图像执行文本检测和处理。 可以对图像周边的非文本特征进行采样,以确保杂波不会破坏取向的估计。
-
公开(公告)号:US09058644B2
公开(公告)日:2015-06-16
申请号:US13800951
申请日:2013-03-13
Applicant: Amazon Technologies, Inc.
Inventor: David Paul Ramos , Chang Yuan , Keith Harrison Goodman , Avnish Sikka
CPC classification number: G06T5/001 , G06K9/00228 , G06K9/03 , G06K9/3258 , G06K9/34 , G06K9/40 , G06K9/44 , G06K2209/01 , G06T7/73 , G06T2207/10004 , G06T2207/30168 , G06T2207/30201
Abstract: Various embodiments enable regions of text to be identified in an image captured by a camera of a computing device for preprocessing before being analyzed by a visual recognition engine. For example, each of the identified regions can be analyzed or tested to determine whether a respective region contains a quality associated with poor text recognition results, such as poor contrast, blur, noise, and the like, which can be measured by one or more algorithms. Upon identifying a region with such a quality, an image quality enhancement can be automatically applied to the respective region without user instruction or intervention. Accordingly, once each region has been cleared of the quality associated with poor recognition, the regions of text can be processed with a visual recognition algorithm or engine.
Abstract translation: 各种实施例使得在由视觉识别引擎分析之前,在由计算设备的照相机拍摄的图像中识别文本区域以进行预处理。 例如,可以分析或测试每个所识别的区域以确定相应区域是否包含与差的文本识别结果相关联的质量,例如差的对比度,模糊,噪声等,其可以由一个或多个 算法。 在识别具有这种质量的区域时,可以在没有用户指导或干预的情况下自动地将图像质量增强应用于相应区域。 因此,一旦每个区域已被清除与识别不良相关的质量,文本区域可以用视觉识别算法或引擎进行处理。
-
公开(公告)号:US09854155B1
公开(公告)日:2017-12-26
申请号:US14741201
申请日:2015-06-16
Applicant: AMAZON TECHNOLOGIES, INC.
Inventor: Avnish Sikka , Yue Liu
CPC classification number: H04N5/23216 , G02B7/08 , G02B7/09 , G03B13/20 , G03B13/36 , H04N5/2257 , H04N5/23212 , H04N5/23222 , H04N5/23238
Abstract: A system and method of determining a tilt angle of a portable computing device using a sensor indicating gravitational pull on the device; determining the tilt angle of a camera of the device; identifying a tilt angle range from a plurality of predetermined tilt angle ranges; determining a first focal length setting using a first array that associates the tilt angle range with the first focal length setting; determining an adjustment increment using a second array that associates the adjustment increment with the tilt angle range; and determining a second focal length setting of the camera using the adjustment increment according to an autofocus scan range algorithm. A portable computing device including a processor; a camera; and a memory device including instructions operable to be executed by the processor to perform a set of actions, enabling the portable computing device to perform the method.
-
公开(公告)号:US09536161B1
公开(公告)日:2017-01-03
申请号:US14307090
申请日:2014-06-17
Applicant: Amazon Technologies, Inc.
Inventor: Christopher John Lish , Oleg Rybakov , Sonjeev Jahagirdar , Junxiong Jia , Neil David Cooper , Avnish Sikka
CPC classification number: H04N5/23245 , G01S3/00 , G06K9/00664 , G06K2009/3291 , H04N5/232 , H04N5/23219 , H04N5/247
Abstract: Various embodiments describe systems and methods for utilizing a reduced amount of processing capacity for incoming data over time, and, in response to detecting a scene-change-event, notify one or more data processors that a scene-change-event has occurred, and cause incoming data to be processed as new data. In some embodiments, an incoming frame can be compared with a reference frame to determine a difference between the reference frame and the incoming frame. The reference frame may be correlated to a latest scene-change-event. In response to a determination that the difference does not meet one or more difference criteria, a user interface or at least one processor of the computing device can be notified to reduce processing of incoming data over time. In response to a determination that the difference meets the one or more difference criteria, the user interface or the at least one processor can be notified that a scene-change-event has occurred. Incoming data to the computing device is then treated as new and processed as those under an active condition. The current incoming frame can be selected as a new reference frame for detecting next scene-change-event.
Abstract translation: 各种实施例描述了随着时间的推移对于输入数据利用减少量的处理能力的系统和方法,并且响应于检测到场景改变事件,通知一个或多个数据处理器已经发生场景变化事件,以及 将传入的数据作为新数据进行处理。 在一些实施例中,输入帧可以与参考帧进行比较,以确定参考帧和输入帧之间的差异。 参考帧可以与最新的场景变化事件相关联。 响应于差异不符合一个或多个差异标准的确定,可以通知用户界面或计算设备的至少一个处理器以减少输入数据随时间的处理。 响应于差异满足一个或多个差异标准的确定,可以向用户界面或至少一个处理器通知场景变化事件已经发生。 然后将接收到计算设备的数据视为新的,并处理为处于活动状态的数据。 可以将当前输入帧选择为用于检测下一个场景改变事件的新参考帧。
-
公开(公告)号:US09262689B1
公开(公告)日:2016-02-16
申请号:US14133347
申请日:2013-12-18
Applicant: Amazon Technologies, Inc.
Inventor: Avnish Sikka , David Paul Ramos , Matthew Daniel Hart , Yue Liu , Emilie Noelle McConville
CPC classification number: G06K9/34 , G06K9/325 , G06K2209/01
Abstract: Embodiments of the subject technology provide for determining a region of a first acquired image based at least on a viewing mode and a set of respective positions of graphical elements to decrease the pre-processing time and perceived latency for the first image. One or more regions of text in the first image are detected, and a set of regions of text that overlap with the region of the image is determined and pre-processed. The subject technology may then pre-process an entirety of a subsequent image (e.g., to pick up missing text from the region of the first image). Thus, additional OCR results may be provided to the user by using the subsequent image(s) and merging subsequent results with previous results from the first image.
Abstract translation: 本技术的实施例提供了至少基于观看模式和图形元素的各个位置的集合来确定第一获取图像的区域,以减少第一图像的预处理时间和感知等待时间。 检测第一图像中的一个或多个文本区域,并且确定并预处理与图像的区域重叠的一组文本区域。 主题技术可以预处理后续图像的整体(例如,从第一图像的区域拾取丢失的文本)。 因此,可以通过使用后续图像向用户提供附加的OCR结果,并将后续结果与来自第一图像的先前结果合并。
-
-
-
-
-
-
-
-
-