-
公开(公告)号:US11172122B2
公开(公告)日:2021-11-09
申请号:US16241438
申请日:2019-01-07
Applicant: Amazon Technologies, Inc.
Inventor: William Evan Welbourne , Ross David Roessler , Cheng-Hao Kuo , Jim Oommen Thomas , Paul Aksenti Savastinuk , Yinfei Yang
Abstract: Devices, systems and methods are disclosed for improving facial recognition and/or speaker recognition models by using results obtained from one model to assist in generating results from the other model. For example, a device may perform facial recognition for image data to identify users and may use the results of the facial recognition to assist in speaker recognition for corresponding audio data. Alternatively or additionally, the device may perform speaker recognition for audio data to identify users and may use the results of the speaker recognition to assist in facial recognition for corresponding image data. As a result, the device may identify users in video data that are not included in the facial recognition model and may identify users in audio data that are not included in the speaker recognition model. The facial recognition and/or speaker recognition models may be updated during run-time and/or offline using post-processed data.
-
公开(公告)号:US09424461B1
公开(公告)日:2016-08-23
申请号:US13929672
申请日:2013-06-27
Applicant: Amazon Technologies, Inc.
Inventor: Chang Yuan , Geoffrey Scott Heller , Oleg Rybakov , Sharadh Ramaswamy , Jim Oommen Thomas
IPC: G06K9/00
CPC classification number: G06K9/00201 , G06K9/00208 , G06K9/00214
Abstract: Various embodiments utilize two-dimensional (“2D”) and three-dimensional (“3D”) object features for purposes such as object recognition and/or image matching. For example, a user can capture an image (e.g., still images or video) of an object and can receive information about items that are determined to match the object. For example, the image can be analyzed to detect visual features (e.g., corners, edges, etc.) of the object and the detected visual features can be combined to generate a combined visual feature vector which can be used for object recognition, image matching, or other such purposes. Other approaches utilize the image to generate a 3D model of the object represented in the image, which can be used to determine at least one object or types of objects that match the object represented in the image.
Abstract translation: 各种实施例利用二维(“2D”)和三维(“3D”)对象特征用于诸如对象识别和/或图像匹配的目的。 例如,用户可以捕获对象的图像(例如,静止图像或视频),并且可以接收关于被确定为匹配对象的项目的信息。 例如,可以分析图像以检测对象的视觉特征(例如,角,边等),并且可以组合检测到的视觉特征以生成可用于对象识别,图像匹配的组合视觉特征向量 ,或其他此类用途。 其他方法利用图像来生成在图像中表示的对象的3D模型,其可以用于确定与图像中表示的对象匹配的对象的至少一个对象或类型。
-
公开(公告)号:US10084959B1
公开(公告)日:2018-09-25
申请号:US14750975
申请日:2015-06-25
Applicant: Amazon Technologies, Inc.
Inventor: Tsz Ho Yu , Jim Oommen Thomas , Cheng-Hao Kuo , Yinfei Yang , Ross David Roessler , Paul Aksenti Savastinuk , William Evan Welbourne
CPC classification number: H04N5/23238 , H04N5/2258 , H04N5/265 , H04N9/09 , H04N9/643 , H04N9/76
Abstract: A video capture device may include multiple cameras that simultaneously capture video data. The video capture device and/or one or more remote computing resources may stitch the video data captured by the multiple cameras to generate stitched video data that corresponds to 360° video. The remote computing resources may apply one or more algorithms to the stitched video data to adjust the color characteristics of the stitched video data, such as lighting, exposure, white balance contrast, and saturation. The remote computing resources may further smooth the transition between the video data captured by the multiple cameras to reduce artifacts such as abrupt changes in color as a result of the individual cameras of the video capture device having different video capture settings. The video capture device and/or the remote computing resources may generate a panoramic video that may include up to a 360° field of view.
-
公开(公告)号:US09652031B1
公开(公告)日:2017-05-16
申请号:US14306937
申请日:2014-06-17
Applicant: Amazon Technologies, Inc.
Inventor: Paul Aksenti Savastinuk , Jim Oommen Thomas , Geoffrey Scott Heller , Michael Lee Sandige , Kah Kuen Fu
IPC: G06F3/01
Abstract: A device configured with a user interface (UI) that changes based on a position of a user determines the position of the user through multiple data sources including camera based head tracking and output from motion sensors such as a gyroscope. Each data source may output its own estimated head position. The device may apply a reliability weight to the head position determined by each data source. A composite head position is then determined from the weighted position. The composite position is then used to render the UI.
-
公开(公告)号:US20160381306A1
公开(公告)日:2016-12-29
申请号:US14753826
申请日:2015-06-29
Applicant: Amazon Technologies, Inc.
Inventor: Yinfei Yang , William Evan Welbourne , Ross David Roessler , Paul Aksenti Savastinuk , Cheng-Hao Kuo , Jim Oommen Thomas , Tsz Ho Yu
IPC: H04N5/262 , G11B27/06 , H04N5/232 , G11B27/031
CPC classification number: H04N5/2628 , G06K9/00711 , G06K9/00751 , G06K9/3233 , G06T3/40 , G11B27/031 , G11B27/06 , H04N5/23238
Abstract: Devices, systems and methods are disclosed for identifying content in video data and creating content-based zooming and panning effects to emphasize the content. Contents may be detected and analyzed in the video data using computer vision, machine learning algorithms or specified through a user interface. Panning and zooming controls may be associated with the contents, panning or zooming based on a location and size of content within the video data. The device may determine a number of pixels associated with content and may frame the content to be a certain percentage of the edited video data, such as a close-up shot where a subject is displayed as 50% of the viewing frame. The device may identify an event of interest, may determine multiple frames associated with the event of interest and may pan and zoom between the multiple frames based on a size/location of the content within the multiple frames.
Abstract translation: 公开了用于识别视频数据中的内容并创建基于内容的缩放和平移效果以强调内容的装置,系统和方法。 可以使用计算机视觉,机器学习算法或通过用户界面指定在视频数据中检测和分析内容。 基于视频数据内的内容的位置和大小,平移和缩放控件可以与内容,平移或缩放相关联。 设备可以确定与内容相关联的多个像素,并且可以将内容构成为编辑的视频数据的特定百分比,例如被摄体显示为观看帧的50%的特写镜头。 设备可以识别感兴趣的事件,可以确定与感兴趣事件相关联的多个帧,并且可以基于多个帧内的内容的大小/位置来在多个帧之间进行平移和缩放。
-
公开(公告)号:US11412133B1
公开(公告)日:2022-08-09
申请号:US16913498
申请日:2020-06-26
Applicant: Amazon Technologies, Inc.
Inventor: Tarun Yohann Morton , Cheng-Hao Kuo , Jim Oommen Thomas , Ning Zhou
Abstract: A device capable of autonomous motion may process image data determined by one or more cameras to determine one or more properties of objects represented in the image data. The device may determine that two or more computer vision components correspond to a particular property. A first computer vision component may process the image data to determine first output data, and the second computer vision component may process the first output data to determine second output data corresponding to the property.
-
公开(公告)号:US20190313014A1
公开(公告)日:2019-10-10
申请号:US16241438
申请日:2019-01-07
Applicant: Amazon Technologies, Inc.
Inventor: William Evan Welbourne , Ross David Roessler , Cheng-Hao Kuo , Jim Oommen Thomas , Paul Aksenti Savastinuk , Yinfei Yang
Abstract: Devices, systems and methods are disclosed for improving facial recognition and/or speaker recognition models by using results obtained from one model to assist in generating results from the other model. For example, a device may perform facial recognition for image data to identify users and may use the results of the facial recognition to assist in speaker recognition for corresponding audio data. Alternatively or additionally, the device may perform speaker recognition for audio data to identify users and may use the results of the speaker recognition to assist in facial recognition for corresponding image data. As a result, the device may identify users in video data that are not included in the facial recognition model and may identify users in audio data that are not included in the speaker recognition model. The facial recognition and/or speaker recognition models may be updated during run-time and/or offline using post-processed data.
-
公开(公告)号:US09729865B1
公开(公告)日:2017-08-08
申请号:US14307492
申请日:2014-06-18
Applicant: Amazon Technologies, Inc.
Inventor: Cheng-Hao Kuo , Jim Oommen Thomas , Tianyang Ma , Stephen Vincent Mangiat , Sisil Sanjeev Mehta , Ambrish Tyagi , Amit Kumar Agrawal , Kah Kuen Fu , Sharadh Ramaswamy
CPC classification number: H04N13/366 , G06F1/1686 , G06F1/3206 , G06F1/3231 , G06K9/00201 , G06K9/00248 , G06K9/00268 , G06K9/00295 , G06K9/00771 , G06T2207/10021 , H04N5/76 , H04N13/204 , H04N13/239 , H04N13/271 , H04N2013/0081 , Y02D10/173
Abstract: Various embodiments enable a primary user to be identified and tracked using stereo association and multiple tracking algorithms. For example, a face detection algorithm can be run on each image captured by a respective camera independently. Stereo association can be performed to match faces between cameras. If the faces are matched and a primary user is determined, a face pair is created and used as the first data point in memory for initializing object tracking. Further, features of a user's face can be extracted and the change in position of these features between images can determine what tracking method will be used for that particular frame.
-
公开(公告)号:US09298974B1
公开(公告)日:2016-03-29
申请号:US14307493
申请日:2014-06-18
Applicant: Amazon Technologies, Inc.
Inventor: Cheng-Hao Kuo , Jim Oommen Thomas , Tianyang Ma , Stephen Vincent Mangiat , Sisil Sanjeev Mehta , Ambrish Tyagi , Amit Kumar Agrawal , Kah Kuen Fu , Sharadh Ramaswamy
CPC classification number: H04N13/239 , G06K9/00261 , G06K9/00288 , G06K9/03 , H04N13/271 , H04N2013/0081 , H04N2013/0092
Abstract: Various embodiments enable a primary user to be identified and tracked using stereo association and multiple tracking algorithms. For example, a face detection algorithm can be run on each image captured by a respective camera independently. Stereo association can be performed to match faces between cameras. If the faces are matched and a primary user is determined, a face pair is created and used as the first data point in memory for initializing object tracking. Further, features of a user's face can be extracted and the change in position of these features between images can determine what tracking method will be used for that particular frame.
Abstract translation: 各种实施例使得能够使用立体声关联和多个跟踪算法来识别和跟踪主要用户。 例如,可以独立地通过各个相机拍摄的每个图像上运行面部检测算法。 可以执行立体声协会来匹配相机之间的面孔。 如果脸部匹配并且确定了主要用户,则创建面部对并将其用作用于初始化对象跟踪的存储器中的第一数据点。 此外,可以提取用户面部的特征,并且图像之间的这些特征的位置变化可以确定将为该特定帧使用什么跟踪方法。
-
公开(公告)号:US11422568B1
公开(公告)日:2022-08-23
申请号:US16680227
申请日:2019-11-11
Applicant: AMAZON TECHNOLOGIES, INC.
Inventor: Jim Oommen Thomas , Jingjing Zheng , Rakesh Ramesh , Yuyin Sun , Lu Xia , Peng Lei , Jiajia Luo
Abstract: An autonomous mobile device (AMD) may perform various tasks during operation. Some tasks, such as delivering a message to a particular user, may involve the AMD identifying the particular user. The AMD includes a camera to acquire an image, and image-based authentication techniques are used to determine a user's identity. A user may move within in a physical space, and the space may contain various obstructions which may occlude images. The AMD may move within the space to obtain a vantage point from which an image of the face of the user is obtained which is suitable for image-based authentication. In some situations, the AMD may present an attention signal, such as playing a sound from a speaker or flashing a light, to encourage the user to look at the AMD, providing an image for use in image-based authentication.
-
-
-
-
-
-
-
-
-