Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Yue Liu"

11.

发明授权
Mitigating effects of electronic audio sources in expression detection 有权

公开(公告)号：US09734845B1

公开(公告)日：2017-08-15

申请号：US14752400

申请日：2015-06-26

Applicant: Amazon Technologies, Inc.

Inventor： Yue Liu , Praveen Jayakumar , Amit Singh Chhetri , Ramya Gopalan

IPC: G10L25/78 , G10L15/00 , H04R3/00 , H04R1/40

CPC classification number: G10L25/78 , G10L15/00 , G10L15/22 , G10L2015/088 , H04R1/406 , H04R3/005 , H04R2420/07 , H04R2420/09

Abstract: In a speech-based system, a wake word or other trigger expression is used to preface user speech that is intended as a command. The system receives multiple directional audio signals, each of which emphasizes sound from a different direction. The signals are monitored and analyzed to detect the directions of interfering audio sources such as televisions or other types of electronic audio players. One of the directional signals having the strongest presence of speech is selected to be monitored for the trigger expression. If the directional signal corresponds to the direction of an interfering audio source, a more strict standard is used to detect the trigger expression. In addition, the directional audio signal having the second strongest presence of speech may also be monitored to detect the trigger expression.

12.

发明授权
Sharpness-based frame selection for OCR 有权
Title translation: 用于OCR的基于锐度的帧选择

公开(公告)号：US09576210B1

公开(公告)日：2017-02-21

申请号：US14500005

申请日：2014-09-29

Applicant: Amazon Technologies, Inc.

Inventor： Yue Liu , Qingfeng Yu , Xing Liu , Pradeep Natarajan

IPC: G06K9/18 , G06K9/46 , G06T5/00 , G06T11/60 , G06T5/10

CPC classification number: G06T5/10 , G06K9/22 , G06K9/4604 , G06K2209/01 , G06T5/003 , G06T7/0002 , G06T11/60 , G06T2207/10016 , G06T2207/20192 , G06T2207/30168

Abstract: A system to select video frames for optical character recognition (OCR) based on feature metrics associated with blur and sharpness. A device captures a video frame including text characters. An edge detection filter is applied to the frame to determine gradient features in perpendicular directions. An “edge map” is created from the gradient features, and points along edges in the edge map are identified. Edge transition widths are determined at each of the edge points based in local intensity minimum and maximum on opposite sides of the respective edge point in the frame. Sharper edges have smaller edge transition widths than blurry images. Statistics are determined from the edge transition widths, and the statistics are processed by a trained classifier to determine if the frame is or is not sufficiently sharp for text processing.

Abstract translation: 基于与模糊和锐度相关联的特征度量来选择用于光学字符识别（OCR）的视频帧的系统。设备捕获包含文本字符的视频帧。将边缘检测滤波器应用于帧以确定垂直方向上的梯度特征。从梯度特征创建“边缘图”，并且识别沿着边缘图中边缘的点。基于帧中相应边缘点的相对侧上的局部强度最小值和最大值，在每个边缘点处确定边缘过渡宽度。更亮的边缘具有比模糊图像更小的边缘过渡宽度。根据边缘转换宽度确定统计量，并且由训练有素的分类器处理统计信息，以确定帧是否为文本处理不够清晰。

13.

发明授权
Sharpness-based frame selection for OCR 有权
Title translation: 用于OCR的基于锐度的帧选择

公开(公告)号：US09418316B1

公开(公告)日：2016-08-16

申请号：US14500208

申请日：2014-09-29

Applicant: Amazon Technologies, Inc.

Inventor： Yue Liu , Qingfeng Yu , Xing Liu , Pradeep Natarajan

IPC: G06K9/62 , G06K9/66 , G06K9/18

CPC classification number: G06K9/3258 , G06K9/6231 , G06K2209/01

Abstract: A process for training and optimizing a system to select video frames for optical character recognition (OCR) based on feature metrics associated with blur and sharpness. A set of image frames are subjectively labelled based on a comparison of each frame before and after binarization to determine to what degree text is recognizable in the binary image. A plurality of different sharpness feature metrics are generated based on the original frame. A classifier is then trained using the feature metrics and the subjective labels. The feature metrics are then tested for accuracy and/or correlation with subjective labelling data. The set of feature metrics may be refined based on which metrics produce the best results.

Abstract translation: 基于与模糊和锐度相关的特征量度，训练和优化系统以选择用于光学字符识别（OCR）的视频帧的过程。基于二值化之前和之后的每个帧的比较来主观地标记一组图像帧，以确定二进制图像中文本是可识别的。基于原始帧生成多个不同的锐度特征度量。然后使用特征指标和主观标签对分类器进行训练。然后测试特征度量的准确性和/或与主观标记数据的相关性。可以基于哪些度量产生最佳结果来改进特征度量集合。

14.

发明授权
Text detection using features associated with neighboring glyph pairs 有权
Title translation: 使用与相邻字形对相关联的功能的文本检测

公开(公告)号：US09367736B1

公开(公告)日：2016-06-14

申请号：US14842125

申请日：2015-09-01

Applicant: Amazon Technologies, Inc.

Inventor： Thibaud Senechal , Quan Wang , Daniel Makoto Willenson , Shuang Wu , Yue Liu , Shiv Naga Prasad Vitaladevuni , David Paul Ramos , Qingfeng Yu

IPC: G06K9/46 , G06K9/00 , G06K9/34

CPC classification number: G06K9/00463 , G06K9/00442 , G06K9/00456 , G06K9/344 , G06K9/348 , G06K9/4638 , G06K9/4652 , G06K2209/01

Abstract: A multi-orientation text detection method and associated system is disclosed that utilizes orientation-variant glyph features to determine a text line in an image regardless of an orientation of the text line. Glyph features are determined for each glyph in an image with respect to a neighboring glyph. The glyph features are provided to a learned classifier that outputs a glyph pair score for each neighboring glyph pair. Each glyph pair score indicates a likelihood that the corresponding pair of neighboring glyphs form part of a same text line. The glyph pair scores are used to identify candidate text lines, which are then ranked to select a final set of text lines in the image.

Abstract translation: 公开了一种多方向文本检测方法和相关系统，其利用取向变体字形特征来确定图像中的文本行，而不管文本行的取向如何。为相对于相邻字形的图像中的每个字形确定字形特征。字形特征被提供给学习的分类器，其为每个相邻字形对输出字形对分数。每个字形对得分表示对应的相邻字形对形成相同文本行的一部分的可能性。字形对分数用于识别候选文本行，然后将其排序以选择图像中的最后一组文本行。

15.

发明授权
Optimizing pre-processing times for faster response 有权
Title translation: 优化预处理时间以加快响应速度

公开(公告)号：US09262689B1

公开(公告)日：2016-02-16

申请号：US14133347

申请日：2013-12-18

Applicant: Amazon Technologies, Inc.

Inventor： Avnish Sikka , David Paul Ramos , Matthew Daniel Hart , Yue Liu , Emilie Noelle McConville

IPC: G06K9/00 , G06K9/46 , G06K9/34

CPC classification number: G06K9/34 , G06K9/325 , G06K2209/01

Abstract: Embodiments of the subject technology provide for determining a region of a first acquired image based at least on a viewing mode and a set of respective positions of graphical elements to decrease the pre-processing time and perceived latency for the first image. One or more regions of text in the first image are detected, and a set of regions of text that overlap with the region of the image is determined and pre-processed. The subject technology may then pre-process an entirety of a subsequent image (e.g., to pick up missing text from the region of the first image). Thus, additional OCR results may be provided to the user by using the subsequent image(s) and merging subsequent results with previous results from the first image.

Abstract translation: 本技术的实施例提供了至少基于观看模式和图形元素的各个位置的集合来确定第一获取图像的区域，以减少第一图像的预处理时间和感知等待时间。检测第一图像中的一个或多个文本区域，并且确定并预处理与图像的区域重叠的一组文本区域。主题技术可以预处理后续图像的整体（例如，从第一图像的区域拾取丢失的文本）。因此，可以通过使用后续图像向用户提供附加的OCR结果，并将后续结果与来自第一图像的先前结果合并。

16.

发明授权
Fast text detection 有权
Title translation: 快速文本检测

公开(公告)号：US09235757B1

公开(公告)日：2016-01-12

申请号：US14477031

申请日：2014-09-04

Applicant: Amazon Technologies, Inc.

Inventor： Yue Liu , Oleg Rybakov

IPC: G06K9/00 , G06K9/62 , G06K9/32

CPC classification number: G06K9/325

Abstract: A system that identifies and recognizes text that offers reduced the computational complexity for processing complex images. Widths of scan line segments within candidate text regions are determined, with the shortest segments selected as being representative of stroke width. Statistical features of the stroke widths are used as part of the process to classify each region as containing or not containing a text character or glyph.

Abstract translation: 识别和识别文本的系统，降低了处理复杂图像的计算复杂度。确定候选文本区域内的扫描线段的宽度，其中选择最短的段代表行程宽度。使用笔画宽度的统计特征作为将每个区域分类为包含或不包含文本字符或字形的过程的一部分。

17.

发明授权
Device selection for providing a response 有权

公开(公告)号：US09875081B2

公开(公告)日：2018-01-23

申请号：US14860400

申请日：2015-09-21

Applicant: Amazon Technologies, Inc.

Inventor： James David Meyers , Shah Samir Pravinchandra , Yue Liu , Arlen Dean , Daniel Miller , Arindam Mandal

IPC: G10L15/22 , G10L15/00 , G06F3/16 , G10L15/26 , G10L15/18 , G10L15/06 , G10L15/32 , G01L21/00 , G10L15/08

CPC classification number: G06F3/167 , G10L15/00 , G10L15/063 , G10L15/1815 , G10L15/22 , G10L15/222 , G10L15/26 , G10L15/32 , G10L2015/088 , G10L2015/223 , G10L2015/226

Abstract: A system may use multiple speech interface devices to interact with a user by speech. All or a portion of the speech interface devices may detect a user utterance and may initiate speech processing to determine a meaning or intent of the utterance. Within the speech processing, arbitration is employed to select one of the multiple speech interface devices to respond to the user utterance. Arbitration may be based in part on metadata that directly or indirectly indicates the proximity of the user to the devices, and the device that is deemed to be nearest the user may be selected to respond to the user utterance.

18.

发明申请
DEVICE SELECTION FOR PROVIDING A RESPONSE 有权

公开(公告)号：US20170083285A1

公开(公告)日：2017-03-23

申请号：US14860400

申请日：2015-09-21

Applicant: Amazon Technologies, Inc.

Inventor： James David Meyers , Shah Samir Pravinchandra , Yue Liu , Arlen Dean , Daniel Miller , Arindam Mandal

IPC: G06F3/16 , G10L15/26 , G10L15/18 , G10L15/22

CPC classification number: G06F3/167 , G10L15/00 , G10L15/063 , G10L15/1815 , G10L15/22 , G10L15/222 , G10L15/26 , G10L15/32 , G10L2015/088 , G10L2015/223 , G10L2015/226

Abstract: A system may use multiple speech interface devices to interact with a user by speech. All or a portion of the speech interface devices may detect a user utterance and may initiate speech processing to determine a meaning or intent of the utterance. Within the speech processing, arbitration is employed to select one of the multiple speech interface devices to respond to the user utterance. Arbitration may be based in part on metadata that directly or indirectly indicates the proximity of the user to the devices, and the device that is deemed to be nearest the user may be selected to respond to the user utterance.

19.

发明授权
Determining user location with remote controller 有权
Title translation: 用遥控器确定用户位置

公开(公告)号：US09430931B1

公开(公告)日：2016-08-30

申请号：US14308601

申请日：2014-06-18

Applicant: Amazon Technologies, Inc.

Inventor： Yue Liu , Robert Warren Sjoberg , Robert Ramsey Flenniken , Ramy Sammy Sadek

IPC: G08B21/24

CPC classification number: G08B21/24

Abstract: An audio device may be configured to work in conjunction with a handheld remote controller to receive voice commands from a user. The audio device may have multiple local microphones that are used for sound source localization, to determine the position of the user. A remote audio signal may be received from the remote controller and used in conjunction with local microphone signals generated by the local microphones to aid in determining the position of the user. The last known position of the user may be recorded whenever the user speaks into the remote controller. When the user is unable to find the remote controller, the audio device may direct the user toward the last known position of the user.

Abstract translation: 音频设备可以被配置为与手持遥控器一起工作以从用户接收语音命令。音频设备可以具有用于声源定位的多个本地麦克风，以确定用户的位置。可以从遥控器接收远程音频信号，并结合由本地麦克风产生的本地麦克风信号来协助确定用户的位置。每当用户对遥控器进行说话时，可以记录用户的最后一个已知位置。当用户不能找到遥控器时，音频设备可以将用户指向用户的最后已知位置。

20.

发明授权
Recognizing text from frames of image data using contextual information 有权
Title translation: 使用上下文信息识别来自图像数据帧的文本

公开(公告)号：US09355336B1

公开(公告)日：2016-05-31

申请号：US14259905

申请日：2014-04-23

Applicant: Amazon Technologies, Inc.

Inventor： Sonjeev Jahagirdar , Matthew Joseph Cole , David Paul Ramos , Utkarsh Prateek , Emilie Noelle McConville , Ankur Datta , Laura Varnum Finney , Yue Liu , Bhavesh Anil Doshi , Avnish Sikka , Michael Vanne

IPC: G06K9/00 , G06K9/62

CPC classification number: G06K9/6217 , G06K9/00979 , G06K9/723 , G06K2209/01

Abstract: Disclosed are techniques for recognizing text from one or more frames of image data using contextual information. In some implementations, image data including a captured textual item is processed to identify an entity in the image data. A context can be selected using the entity, where the context corresponds to a dictionary. Text in the captured textual item can be identified using the dictionary. The identified text can be output to a display device.

Abstract translation: 公开了使用上下文信息从一个或多个图像数据帧识别文本的技术。在一些实现中，处理包括捕获的文本项的图像数据以识别图像数据中的实体。可以使用实体选择上下文，其中上下文对应于字典。捕获的文本项目中的文本可以使用字典来识别。识别的文本可以输出到显示设备。

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification