Patent search ap:("Amazon Technologies Page Inc.") AND inv:"Tyler Jerel Etchart"

1.

发明申请
ACTIVE SPEAKER DETECTION USING IMAGE DATA 有权

公开(公告)号：US20230068798A1

公开(公告)日：2023-03-02

申请号：US17465143

申请日：2021-09-02

Applicant: Amazon Technologies, Inc.

Inventor： Tyler Jerel Etchart , Vivek Yadav , Pradeep Natarajan

IPC: G10L15/25 , G06K9/00 , G06T7/70 , G10L15/22 , G10L25/78

Abstract: A system can operate a speech-controlled device to perform active speaker detection to detect an utterance using image data showing a user speaking the utterance. This enables the device to perform utterance detection using the image data and/or determine which user is speaking the utterance. To perform active speaker detection, the device processes the image data to determine expression parameters associated with the user's face and generates facial measurements based on the expression parameters. For example, the device can use the expression parameters to generate a 3D model including an agnostic facial representation and determine a mouth aspect ratio by measuring a mouth height and a mouth width of the agnostic facial representation. As the mouth aspect ratio changes when the user is speaking, the device can determine that the user is speaking and/or detect an utterance based on an amount of variation of the mouth aspect ratio.

Patent Agency Ranking