User identification based on voice and face

    公开(公告)号:US11172122B2

    公开(公告)日:2021-11-09

    申请号:US16241438

    申请日:2019-01-07

    Abstract: Devices, systems and methods are disclosed for improving facial recognition and/or speaker recognition models by using results obtained from one model to assist in generating results from the other model. For example, a device may perform facial recognition for image data to identify users and may use the results of the facial recognition to assist in speaker recognition for corresponding audio data. Alternatively or additionally, the device may perform speaker recognition for audio data to identify users and may use the results of the speaker recognition to assist in facial recognition for corresponding image data. As a result, the device may identify users in video data that are not included in the facial recognition model and may identify users in audio data that are not included in the speaker recognition model. The facial recognition and/or speaker recognition models may be updated during run-time and/or offline using post-processed data.

    CONTENT-BASED ZOOMING AND PANNING FOR VIDEO CURATION
    3.
    发明申请
    CONTENT-BASED ZOOMING AND PANNING FOR VIDEO CURATION 有权
    基于内容的缩放和视频剪辑

    公开(公告)号:US20160381306A1

    公开(公告)日:2016-12-29

    申请号:US14753826

    申请日:2015-06-29

    Abstract: Devices, systems and methods are disclosed for identifying content in video data and creating content-based zooming and panning effects to emphasize the content. Contents may be detected and analyzed in the video data using computer vision, machine learning algorithms or specified through a user interface. Panning and zooming controls may be associated with the contents, panning or zooming based on a location and size of content within the video data. The device may determine a number of pixels associated with content and may frame the content to be a certain percentage of the edited video data, such as a close-up shot where a subject is displayed as 50% of the viewing frame. The device may identify an event of interest, may determine multiple frames associated with the event of interest and may pan and zoom between the multiple frames based on a size/location of the content within the multiple frames.

    Abstract translation: 公开了用于识别视频数据中的内容并创建基于内容的缩放和平移效果以强调内容的装置,系统和方法。 可以使用计算机视觉,机器学习算法或通过用户界面指定在视频数据中检测和分析内容。 基于视频数据内的内容的位置和大小,平移和缩放控件可以与内容,平移或缩放相关联。 设备可以确定与内容相关联的多个像素,并且可以将内容构成为编辑的视频数据的特定百分比,例如被摄体显示为观看帧的50%的特写镜头。 设备可以识别感兴趣的事件,可以确定与感兴趣事件相关联的多个帧,并且可以基于多个帧内的内容的大小/位置来在多个帧之间进行平移和缩放。

    USER IDENTIFICATION BASED ON VOICE AND FACE
    5.
    发明申请

    公开(公告)号:US20190313014A1

    公开(公告)日:2019-10-10

    申请号:US16241438

    申请日:2019-01-07

    Abstract: Devices, systems and methods are disclosed for improving facial recognition and/or speaker recognition models by using results obtained from one model to assist in generating results from the other model. For example, a device may perform facial recognition for image data to identify users and may use the results of the facial recognition to assist in speaker recognition for corresponding audio data. Alternatively or additionally, the device may perform speaker recognition for audio data to identify users and may use the results of the speaker recognition to assist in facial recognition for corresponding image data. As a result, the device may identify users in video data that are not included in the facial recognition model and may identify users in audio data that are not included in the speaker recognition model. The facial recognition and/or speaker recognition models may be updated during run-time and/or offline using post-processed data.

    Object identification through stereo association
    7.
    发明授权
    Object identification through stereo association 有权
    通过立体声关联对象识别

    公开(公告)号:US09298974B1

    公开(公告)日:2016-03-29

    申请号:US14307493

    申请日:2014-06-18

    Abstract: Various embodiments enable a primary user to be identified and tracked using stereo association and multiple tracking algorithms. For example, a face detection algorithm can be run on each image captured by a respective camera independently. Stereo association can be performed to match faces between cameras. If the faces are matched and a primary user is determined, a face pair is created and used as the first data point in memory for initializing object tracking. Further, features of a user's face can be extracted and the change in position of these features between images can determine what tracking method will be used for that particular frame.

    Abstract translation: 各种实施例使得能够使用立体声关联和多个跟踪算法来识别和跟踪主要用户。 例如,可以独立地通过各个相机拍摄的每个图像上运行面部检测算法。 可以执行立体声协会来匹配相机之间的面孔。 如果脸部匹配并且确定了主要用户,则创建面部对并将其用作用于初始化对象跟踪的存储器中的第一数据点。 此外,可以提取用户面部的特征,并且图像之间的这些特征的位置变化可以确定将为该特定帧使用什么跟踪方法。

    System to determine user engagement with autonomous mobile device

    公开(公告)号:US11367306B1

    公开(公告)日:2022-06-21

    申请号:US16909074

    申请日:2020-06-23

    Abstract: An autonomous mobile device (AMD) or other device may perform various tasks during operation. The AMD includes a camera to acquire an image. Some tasks, such as presenting information on a display screen or a video call, may involve the AMD determining whether a user is engaged with the AMD. The AMD may move a component, such as the camera or the display screen, to provide a best experience for an engaged user. Images from the camera are processed to determine attributes of the user, such as yaw of the face of the user, pitch of the face of the user, distance from the camera, and so forth. Based on the values of these attributes, a user engagement score is determined. The score may be used to select a particular user from many users in the image, or to otherwise facilitate operation of the AMD.

Patent Agency Ranking