-
公开(公告)号:US20230068798A1
公开(公告)日:2023-03-02
申请号:US17465143
申请日:2021-09-02
Applicant: Amazon Technologies, Inc.
Inventor: Tyler Jerel Etchart , Vivek Yadav , Pradeep Natarajan
Abstract: A system can operate a speech-controlled device to perform active speaker detection to detect an utterance using image data showing a user speaking the utterance. This enables the device to perform utterance detection using the image data and/or determine which user is speaking the utterance. To perform active speaker detection, the device processes the image data to determine expression parameters associated with the user's face and generates facial measurements based on the expression parameters. For example, the device can use the expression parameters to generate a 3D model including an agnostic facial representation and determine a mouth aspect ratio by measuring a mouth height and a mouth width of the agnostic facial representation. As the mouth aspect ratio changes when the user is speaking, the device can determine that the user is speaking and/or detect an utterance based on an amount of variation of the mouth aspect ratio.