-
11.
公开(公告)号:US09576196B1
公开(公告)日:2017-02-21
申请号:US14463746
申请日:2014-08-20
Applicant: Amazon Technologies, Inc.
Inventor: Pradeep Natarajan
CPC classification number: G06K9/3258 , G06K2209/01
Abstract: A system to recognize text or symbols contained in a captured image using machine learning models leverages context information about the image to improve accuracy. Contextual information is determined for the entire image, or spatial regions of the images, and is provided to a machine learning model when a determination is made as to whether a region does or does not contain text or symbols. Associating features related to the larger context with features extracted from regions potentially containing text or symbolic content provides an incremental improvement of results obtained using machine learning techniques.
Abstract translation: 使用机器学习模型识别拍摄图像中包含的文本或符号的系统利用关于图像的上下文信息来提高准确度。 为整个图像或图像的空间区域确定上下文信息,并且当确定区域是否包含文本或符号时,将其提供给机器学习模型。 将与较大背景相关的特征与从潜在地包含文本或符号内容的区域提取的特征相关联的特征提供了使用机器学习技术获得的结果的渐进改进。
-
12.
公开(公告)号:US09286541B1
公开(公告)日:2016-03-15
申请号:US14484764
申请日:2014-09-12
Applicant: Amazon Technologies, Inc.
Inventor: Pradeep Natarajan
CPC classification number: G06K9/4642 , G06K9/00442 , G06K9/00463 , G06K9/3283 , G06K9/342 , G06K9/6878
Abstract: A system that removes underlines in text appearing in captured images in multiple stages. The improved system rejects most text regions that do not require underline removal quickly and performs detailed underline detection and removal on a small number of regions.
Abstract translation: 一种系统,可以在多个阶段中删除出现在捕获图像中的文本中的下划线。 改进的系统拒绝大多数不需要快速下划线删除的文本区域,并在少数区域执行详细的下划线检测和删除。
-
公开(公告)号:US09224061B1
公开(公告)日:2015-12-29
申请号:US14464365
申请日:2014-08-20
Applicant: Amazon Technologies, Inc.
Inventor: Pradeep Natarajan , Avnish Sikka , Rohit Prasad
CPC classification number: G06K9/3208 , G06K9/3258 , G06K2209/01
Abstract: A system estimates text orientation in images captured using a handheld camera prior detecting text in the image. Text orientation is estimated based on edges detected within the image, and the image is rotated based on the estimated orientation. Text detection and processing is then performed on the rotated image. Non-text features along a periphery of the image may be sampled to assure that clutter will not undermine the estimation of orientation.
Abstract translation: 系统估计在检测图像中的文本之前使用手持相机拍摄的图像中的文本方向。 基于在图像内检测到的边缘估计文本取向,并且基于估计的方向旋转图像。 然后对旋转的图像执行文本检测和处理。 可以对图像周边的非文本特征进行采样,以确保杂波不会破坏取向的估计。
-
公开(公告)号:US20220405528A1
公开(公告)日:2022-12-22
申请号:US17740533
申请日:2022-05-10
Applicant: Amazon Technologies, Inc.
Inventor: Vivek Yadav , Aayush Gupta , Yue Wu , Pradeep Natarajan , Ayush Jaiswal
Abstract: Techniques for masking images based on a particular task are described. A system masks portions of an image that are not relevant to a particular task, thus, reducing the amount of data used by applications for image processing tasks. For example, images to be processed using a hair color classification model are masked so that only portions that show the person's hair are available for the model to analyze. The system configures different masker components to mask images for different tasks. A masker component can be implemented at a user device to mask images prior to sending to an application/task-specific model.
-
公开(公告)号:US11417061B1
公开(公告)日:2022-08-16
申请号:US17159853
申请日:2021-01-27
Applicant: Amazon Technologies, Inc.
Inventor: Jianwei Feng , Vivek Yadav , Pradeep Natarajan
IPC: G06T17/20
Abstract: Devices and techniques are generally described for three dimensional mesh generation. In various examples, first two-dimensional (2D) image data representing a human may be received. In various further examples, bounding box data identifying a location of the human in the first 2D image data and joint data identifying locations of joints of the human may be received. Second 2D image data representing a cropped portion of the human may be generated using the bounding box data and the joint data. A three-dimensional (3D) mesh prediction model may be used to determine a pose, a shape, and a projection matrix for the human. The 3D mesh prediction model may be used to determine a transformed projection matrix for the portion of the human represented in the second 2D image data.
-
公开(公告)号:US11334773B2
公开(公告)日:2022-05-17
申请号:US16913837
申请日:2020-06-26
Applicant: Amazon Technologies, Inc.
Inventor: Vivek Yadav , Aayush Gupta , Yue Wu , Pradeep Natarajan , Ayush Jaiswal
Abstract: Techniques for masking images based on a particular task are described. A system masks portions of an image that are not relevant to a particular task, thus, reducing the amount of data used by applications for image processing tasks. For example, images to be processed using a hair color classification model are masked so that only portions that show the person's hair are available for the model to analyze. The system configures different masker components to mask images for different tasks. A masker component can be implemented at a user device to mask images prior to sending to an application/task-specific model.
-
公开(公告)号:US09576210B1
公开(公告)日:2017-02-21
申请号:US14500005
申请日:2014-09-29
Applicant: Amazon Technologies, Inc.
Inventor: Yue Liu , Qingfeng Yu , Xing Liu , Pradeep Natarajan
CPC classification number: G06T5/10 , G06K9/22 , G06K9/4604 , G06K2209/01 , G06T5/003 , G06T7/0002 , G06T11/60 , G06T2207/10016 , G06T2207/20192 , G06T2207/30168
Abstract: A system to select video frames for optical character recognition (OCR) based on feature metrics associated with blur and sharpness. A device captures a video frame including text characters. An edge detection filter is applied to the frame to determine gradient features in perpendicular directions. An “edge map” is created from the gradient features, and points along edges in the edge map are identified. Edge transition widths are determined at each of the edge points based in local intensity minimum and maximum on opposite sides of the respective edge point in the frame. Sharper edges have smaller edge transition widths than blurry images. Statistics are determined from the edge transition widths, and the statistics are processed by a trained classifier to determine if the frame is or is not sufficiently sharp for text processing.
Abstract translation: 基于与模糊和锐度相关联的特征度量来选择用于光学字符识别(OCR)的视频帧的系统。 设备捕获包含文本字符的视频帧。 将边缘检测滤波器应用于帧以确定垂直方向上的梯度特征。 从梯度特征创建“边缘图”,并且识别沿着边缘图中边缘的点。 基于帧中相应边缘点的相对侧上的局部强度最小值和最大值,在每个边缘点处确定边缘过渡宽度。 更亮的边缘具有比模糊图像更小的边缘过渡宽度。 根据边缘转换宽度确定统计量,并且由训练有素的分类器处理统计信息,以确定帧是否为文本处理不够清晰。
-
公开(公告)号:US09418316B1
公开(公告)日:2016-08-16
申请号:US14500208
申请日:2014-09-29
Applicant: Amazon Technologies, Inc.
Inventor: Yue Liu , Qingfeng Yu , Xing Liu , Pradeep Natarajan
CPC classification number: G06K9/3258 , G06K9/6231 , G06K2209/01
Abstract: A process for training and optimizing a system to select video frames for optical character recognition (OCR) based on feature metrics associated with blur and sharpness. A set of image frames are subjectively labelled based on a comparison of each frame before and after binarization to determine to what degree text is recognizable in the binary image. A plurality of different sharpness feature metrics are generated based on the original frame. A classifier is then trained using the feature metrics and the subjective labels. The feature metrics are then tested for accuracy and/or correlation with subjective labelling data. The set of feature metrics may be refined based on which metrics produce the best results.
Abstract translation: 基于与模糊和锐度相关的特征量度,训练和优化系统以选择用于光学字符识别(OCR)的视频帧的过程。 基于二值化之前和之后的每个帧的比较来主观地标记一组图像帧,以确定二进制图像中文本是可识别的。 基于原始帧生成多个不同的锐度特征度量。 然后使用特征指标和主观标签对分类器进行训练。 然后测试特征度量的准确性和/或与主观标记数据的相关性。 可以基于哪些度量产生最佳结果来改进特征度量集合。
-
-
-
-
-
-
-