-
公开(公告)号:US11172122B2
公开(公告)日:2021-11-09
申请号:US16241438
申请日:2019-01-07
Applicant: Amazon Technologies, Inc.
Inventor: William Evan Welbourne , Ross David Roessler , Cheng-Hao Kuo , Jim Oommen Thomas , Paul Aksenti Savastinuk , Yinfei Yang
Abstract: Devices, systems and methods are disclosed for improving facial recognition and/or speaker recognition models by using results obtained from one model to assist in generating results from the other model. For example, a device may perform facial recognition for image data to identify users and may use the results of the facial recognition to assist in speaker recognition for corresponding audio data. Alternatively or additionally, the device may perform speaker recognition for audio data to identify users and may use the results of the speaker recognition to assist in facial recognition for corresponding image data. As a result, the device may identify users in video data that are not included in the facial recognition model and may identify users in audio data that are not included in the speaker recognition model. The facial recognition and/or speaker recognition models may be updated during run-time and/or offline using post-processed data.
-
公开(公告)号:US20190313014A1
公开(公告)日:2019-10-10
申请号:US16241438
申请日:2019-01-07
Applicant: Amazon Technologies, Inc.
Inventor: William Evan Welbourne , Ross David Roessler , Cheng-Hao Kuo , Jim Oommen Thomas , Paul Aksenti Savastinuk , Yinfei Yang
Abstract: Devices, systems and methods are disclosed for improving facial recognition and/or speaker recognition models by using results obtained from one model to assist in generating results from the other model. For example, a device may perform facial recognition for image data to identify users and may use the results of the facial recognition to assist in speaker recognition for corresponding audio data. Alternatively or additionally, the device may perform speaker recognition for audio data to identify users and may use the results of the speaker recognition to assist in facial recognition for corresponding image data. As a result, the device may identify users in video data that are not included in the facial recognition model and may identify users in audio data that are not included in the speaker recognition model. The facial recognition and/or speaker recognition models may be updated during run-time and/or offline using post-processed data.
-
公开(公告)号:US10084959B1
公开(公告)日:2018-09-25
申请号:US14750975
申请日:2015-06-25
Applicant: Amazon Technologies, Inc.
Inventor: Tsz Ho Yu , Jim Oommen Thomas , Cheng-Hao Kuo , Yinfei Yang , Ross David Roessler , Paul Aksenti Savastinuk , William Evan Welbourne
CPC classification number: H04N5/23238 , H04N5/2258 , H04N5/265 , H04N9/09 , H04N9/643 , H04N9/76
Abstract: A video capture device may include multiple cameras that simultaneously capture video data. The video capture device and/or one or more remote computing resources may stitch the video data captured by the multiple cameras to generate stitched video data that corresponds to 360° video. The remote computing resources may apply one or more algorithms to the stitched video data to adjust the color characteristics of the stitched video data, such as lighting, exposure, white balance contrast, and saturation. The remote computing resources may further smooth the transition between the video data captured by the multiple cameras to reduce artifacts such as abrupt changes in color as a result of the individual cameras of the video capture device having different video capture settings. The video capture device and/or the remote computing resources may generate a panoramic video that may include up to a 360° field of view.
-
公开(公告)号:US20160381306A1
公开(公告)日:2016-12-29
申请号:US14753826
申请日:2015-06-29
Applicant: Amazon Technologies, Inc.
Inventor: Yinfei Yang , William Evan Welbourne , Ross David Roessler , Paul Aksenti Savastinuk , Cheng-Hao Kuo , Jim Oommen Thomas , Tsz Ho Yu
IPC: H04N5/262 , G11B27/06 , H04N5/232 , G11B27/031
CPC classification number: H04N5/2628 , G06K9/00711 , G06K9/00751 , G06K9/3233 , G06T3/40 , G11B27/031 , G11B27/06 , H04N5/23238
Abstract: Devices, systems and methods are disclosed for identifying content in video data and creating content-based zooming and panning effects to emphasize the content. Contents may be detected and analyzed in the video data using computer vision, machine learning algorithms or specified through a user interface. Panning and zooming controls may be associated with the contents, panning or zooming based on a location and size of content within the video data. The device may determine a number of pixels associated with content and may frame the content to be a certain percentage of the edited video data, such as a close-up shot where a subject is displayed as 50% of the viewing frame. The device may identify an event of interest, may determine multiple frames associated with the event of interest and may pan and zoom between the multiple frames based on a size/location of the content within the multiple frames.
Abstract translation: 公开了用于识别视频数据中的内容并创建基于内容的缩放和平移效果以强调内容的装置,系统和方法。 可以使用计算机视觉,机器学习算法或通过用户界面指定在视频数据中检测和分析内容。 基于视频数据内的内容的位置和大小,平移和缩放控件可以与内容,平移或缩放相关联。 设备可以确定与内容相关联的多个像素,并且可以将内容构成为编辑的视频数据的特定百分比,例如被摄体显示为观看帧的50%的特写镜头。 设备可以识别感兴趣的事件,可以确定与感兴趣事件相关联的多个帧,并且可以基于多个帧内的内容的大小/位置来在多个帧之间进行平移和缩放。
-
公开(公告)号:US10104286B1
公开(公告)日:2018-10-16
申请号:US14837793
申请日:2015-08-27
Applicant: Amazon Technologies, Inc.
Inventor: Tsz Ho Yu , Paul Aksenti Savastinuk , Yinfei Yang , Cheng-Hao Kuo , Ross David Roessler , William Evan Welbourne
Abstract: Systems and methods may be directed to de-blurring panoramic images and/or video. An image processor may receive a frame, where the frame comprises a plurality of pixel values arranged in a grid. The image processor may divide the frame into a first section and a second section. The image processor may determine a first motion kernel for the first section and apply the first motion kernel to the first section. The image processor may also determine a second motion kernel for the second section and apply the second motion kernel to the second section.
-
公开(公告)号:US09973711B2
公开(公告)日:2018-05-15
申请号:US14753826
申请日:2015-06-29
Applicant: Amazon Technologies, Inc.
Inventor: Yinfei Yang , William Evan Welbourne , Ross David Roessler , Paul Aksenti Savastinuk , Cheng-Hao Kuo , Jim Oommen Thomas , Tsz Ho Yu
CPC classification number: H04N5/2628 , G06K9/00711 , G06K9/00751 , G06K9/3233 , G06T3/40 , G11B27/031 , G11B27/06 , H04N5/23238
Abstract: Devices, systems and methods are disclosed for identifying content in video data and creating content-based zooming and panning effects to emphasize the content. Contents may be detected and analyzed in the video data using computer vision, machine learning algorithms or specified through a user interface. Panning and zooming controls may be associated with the contents, panning or zooming based on a location and size of content within the video data. The device may determine a number of pixels associated with content and may frame the content to be a certain percentage of the edited video data, such as a close-up shot where a subject is displayed as 50% of the viewing frame. The device may identify an event of interest, may determine multiple frames associated with the event of interest and may pan and zoom between the multiple frames based on a size/location of the content within the multiple frames.
-
公开(公告)号:US10582125B1
公开(公告)日:2020-03-03
申请号:US14727782
申请日:2015-06-01
Applicant: Amazon Technologies, Inc.
Inventor: Ross David Roessler , Matthew Alan Townsend , Yinfei Yang , Jim Oommen Thomas , Deon Poncini , William Evan Welbourne , Geoff Hunter Donaldson , Paul Aksenti Savastinuk , Cheng-Hao Kuo
Abstract: A video capture device may include multiple cameras that simultaneously capture video data. The video capture device and/or one or more remote computing resources may stitch the video data captured by the multiple cameras to generate stitched video data that corresponds to 360° video. The remote computing resources may apply one or more algorithms to the stitched video data to identify one or more frames that depict content that is likely to be of interest to a user. The video capture device and/or the remote computing resources may generate one or more images from the one or more frames, and may send the one or more images to the user.
-
公开(公告)号:US10277813B1
公开(公告)日:2019-04-30
申请号:US14751024
申请日:2015-06-25
Applicant: Amazon Technologies, Inc.
Inventor: Jim Oommen Thomas , Paul Aksenti Savastinuk , Cheng-Hao Kuo , Tsz Ho Yu , Ross David Roessler , William Evan Welbourne , Yinfei Yang
Abstract: A viewing device, such as a virtual reality headset, allows a user to view a panoramic scene captured by one or more video capture devices that may include multiple cameras that simultaneously capture 360° video data. The viewing device may display the panoramic scene in real time and change the display in response to moving the viewing device and/or changing perspectives by switching to video data being captured by a different video capture device within the environment. Moreover, multiple video capture devices located within an environment can be used to create a three-dimensional representation of the environment that allows a user to explore the three-dimensional space while viewing the environment in real time.
-
公开(公告)号:US10178301B1
公开(公告)日:2019-01-08
申请号:US14750895
申请日:2015-06-25
Applicant: Amazon Technologies, Inc.
Inventor: William Evan Welbourne , Ross David Roessler , Cheng-Hao Kuo , Jim Oommen Thomas , Paul Aksenti Savastinuk , Yinfei Yang
Abstract: Devices, systems and methods are disclosed for improving facial recognition and/or speaker recognition models by using results obtained from one model to assist in generating results from the other model. For example, a device may perform facial recognition for image data to identify users and may use the results of the facial recognition to assist in speaker recognition for corresponding audio data. Alternatively or additionally, the device may perform speaker recognition for audio data to identify users and may use the results of the speaker recognition to assist in facial recognition for corresponding image data. As a result, the device may identify users in video data that are not included in the facial recognition model and may identify users in audio data that are not included in the speaker recognition model. The facial recognition and/or speaker recognition models may be updated during run-time and/or offline using post-processed data.
-
-
-
-
-
-
-
-