-
公开(公告)号:US20230419579A1
公开(公告)日:2023-12-28
申请号:US18462310
申请日:2023-09-06
Applicant: Meta Platforms Technologies, LLC
Inventor: Alexander Richard , Michael Zollhoefer , Fernando De la Torre , Yaser Sheikh
CPC classification number: G06T13/205 , G06T13/40 , G06T17/20 , G06T19/006 , G10L21/14 , G10L2021/105
Abstract: A method for training a three-dimensional model face animation model from speech, is provided. The method includes determining a first correlation value for a facial feature based on an audio waveform from a first subject, generating a first mesh for a lower portion of a human face, based on the facial feature and the first correlation value, updating the first correlation value when a difference between the first mesh and a ground truth image of the first subject is greater than a pre-selected threshold, and providing a three-dimensional model of the human face animated by speech to an immersive reality application accessed by a client device based on the difference between the first mesh and the ground truth image of the first subject. A non-transitory, computer-readable medium storing instructions to cause a system to perform the above method, and the system, are also provided.
-
公开(公告)号:US11756250B2
公开(公告)日:2023-09-12
申请号:US17669270
申请日:2022-02-10
Applicant: Meta Platforms Technologies, LLC
Inventor: Alexander Richard , Michael Zollhoefer , Fernando De la Torre , Yaser Sheikh
CPC classification number: G06T13/205 , G06T13/40 , G06T17/20 , G06T19/006 , G10L21/14 , G10L2021/105
Abstract: A method for training a three-dimensional model face animation model from speech, is provided. The method includes determining a first correlation value for a facial feature based on an audio waveform from a first subject, generating a first mesh for a lower portion of a human face, based on the facial feature and the first correlation value, updating the first correlation value when a difference between the first mesh and a ground truth image of the first subject is greater than a pre-selected threshold, and providing a three-dimensional model of the human face animated by speech to an immersive reality application accessed by a client device based on the difference between the first mesh and the ground truth image of the first subject. A non-transitory, computer-readable medium storing instructions to cause a system to perform the above method, and the system, are also provided.
-
3.
公开(公告)号:US11683448B2
公开(公告)日:2023-06-20
申请号:US17543519
申请日:2021-12-06
Applicant: Duelight LLC
Inventor: William Guie Rivard , Brian J. Kindle , Adam Barry Feder
IPC: H04N7/15 , H04L51/046 , H04L65/80 , G06T13/40 , G06T19/00 , H04L65/403 , H04L51/04 , G10L21/10 , H04L12/18 , H04N7/14 , G10L25/63 , H04L51/10
CPC classification number: H04N7/157 , G06T13/40 , G06T19/00 , G10L21/10 , H04L12/1827 , H04L51/04 , H04L51/046 , H04L65/403 , H04L65/80 , H04N7/147 , G06T2219/024 , G10L25/63 , G10L2021/105 , H04L51/10
Abstract: A system, method, and computer program are provided for receiving face models based on face nodal points. In use, a real-time face model is received, wherein the real-time face model includes one or more face nodal points. Real-time face nodal points are received, including additional one or more face nodal points. The real-time face model is manipulated based on the real-time face nodal points.
-
公开(公告)号:US20230178095A1
公开(公告)日:2023-06-08
申请号:US17764324
申请日:2021-06-03
Applicant: DEEPBRAIN AI INC.
Inventor: Guem Buel HWANG , Gyeong Su CHAE
CPC classification number: G10L21/10 , G06T13/80 , G06T13/40 , G10L2021/105
Abstract: An apparatus for generating a lip sync image according to a disclosed embodiment has one or more processors and a memory which stores one or more programs executed by the one or more processors. The apparatus includes a first artificial neural network model configured to generate an utterance synthesis image by using a person background image and an utterance audio signal corresponding to the person background image as an input, and generate a silence synthesis image by using only the person background image as an input, and a second artificial neural network model configured to output, from a preset utterance maintenance image and the first artificial neural network model, classification values for the preset utterance maintenance image and the silence synthesis image by using the silence synthesis image as an input.
-
公开(公告)号:US10062394B2
公开(公告)日:2018-08-28
申请号:US14674493
申请日:2015-03-31
Applicant: BOSE CORPORATION
Inventor: Lee Zamir
CPC classification number: G10L21/10 , G10L25/78 , G10L2021/105
Abstract: A system encourages experimentation with audio frequency and speaker technologies while causing an inanimate object to appear to lip-sync. The system applies a bandpass filter to an incoming audio stream to determine a magnitude of audio content in a frequency band of interest. For example, the system may filter results directed at the voice band, associated with speech. A controller controls a strobe light to flash at a particular point of travel of a platform reciprocating at a known frequency. An illusion is created that a sculpture, such as a piece of paper formed into a ring, is lip-synching to music.
-
公开(公告)号:US09956407B2
公开(公告)日:2018-05-01
申请号:US14816323
申请日:2015-08-03
Applicant: Cochlear Limited
Inventor: Florent Maxime Hubert-Brierre
CPC classification number: A61N1/36036 , A61N1/0541 , A61N1/37247 , G10L21/10 , G10L2021/105 , H04R25/554 , H04R2225/55 , H04R2225/67
Abstract: Embodiments presented herein are generally directed to techniques for compensating for tonal deafness experienced by a recipient of an auditory prosthesis. More specifically, an auditory prosthesis system includes an external device configured to generate a graphical representation that enables the recipient to compensate for reduced tonal perception associated with delivery of the stimulation signals representative of speech signals. The external device is configured to analyze received speech signals to determine vocal articulator movement of the speaker of the speech signals and/or emotion of the speaker. The external device is further configured to display one or more animated visual cues representative of the detected vocal articulator movement and/or emotion.
-
公开(公告)号:US09911218B2
公开(公告)日:2018-03-06
申请号:US14956119
申请日:2015-12-01
Applicant: Disney Enterprises, Inc.
Inventor: Barry-John Theobald , Margaret Meyerhofer , Iain Matthews , Sarah Taylor
IPC: G06T13/40 , G06T13/20 , G10L15/187 , G06T13/80 , G10L15/02
CPC classification number: G06T13/205 , G06T13/40 , G06T13/80 , G10L15/187 , G10L21/10 , G10L2015/022 , G10L2021/105
Abstract: Speech animation may be performed using visemes with phonetic boundary context. A viseme unit may comprise an animation that simulates lip movement of an animated entity. Individual ones of the viseme units may correspond to one or more complete phonemes and phoneme context of the one or more complete phonemes. Phoneme context may include a phoneme that is adjacent to the one or more complete phonemes that correspond to a given viseme unit. Potential sets of viseme units that correspond with individual phoneme string portions may be determined. One of the potential sets of viseme units may be selected for individual ones of the phoneme string portions based on a fit metric that conveys a match between individual ones of the potential sets and the corresponding phoneme string portion.
-
公开(公告)号:US20170345437A1
公开(公告)日:2017-11-30
申请号:US15607419
申请日:2017-05-26
Inventor: YU ZHANG
IPC: G10L21/0216 , G10L21/028 , G10L21/0224 , G10L21/10
CPC classification number: G10L21/0216 , G06K9/0057 , G10L21/0224 , G10L21/028 , G10L21/10 , G10L2021/02165 , G10L2021/02166 , G10L2021/105
Abstract: A voice receiving device configured for accurate listening includes a microphone array, a camera, a capturing module, a determining module, a time module, a calculating module, and a de-noising module. The microphone array captures a first voice signal and a second voice signal and the camera captures mouth pictures of a user. The determining module determines whether the first voice signal is synchronized with the mouth pictures, and if so compares the first voice signal to a model preset voice signal of a user to determine a target voice signal. The time module obtains time delay difference between one voice reaching different microphones. The calculating module calculates a position of sound source of the target voice signal. According to the position of the sound source, the de-noising module de-noises by reference to the second voice signal. The disclosure further provides a voice receiving method.
-
公开(公告)号:US20170308905A1
公开(公告)日:2017-10-26
申请号:US15642365
申请日:2017-07-06
Applicant: Ratnakumar Navaratnam
Inventor: Ratnakumar Navaratnam
IPC: G06Q30/00 , G06F17/30 , G10L15/26 , G10L21/10 , G06F3/01 , G10L13/04 , H04N7/15 , G10L15/02 , H04N7/14 , H04N5/232 , G06Q30/02 , G10L13/00 , G10L15/00 , G10L15/22
CPC classification number: G06Q30/016 , B25J11/001 , B25J11/0015 , B25J13/00 , G06F3/017 , G06F17/30259 , G06F17/30654 , G06F17/30766 , G06Q30/02 , G06Q30/0241 , G10L13/00 , G10L13/043 , G10L15/00 , G10L15/02 , G10L15/22 , G10L15/26 , G10L15/265 , G10L21/10 , G10L2015/025 , G10L2021/105 , H04M3/5125 , H04M2203/1025 , H04N5/23203 , H04N5/23219 , H04N7/147 , H04N7/157
Abstract: A system for remote servicing of customers includes an interactive display unit at the customer location providing two-way audio/visual communication with a remote service/sales agent, wherein communication inputted by the agent is delivered to customers via a virtual Digital Actor on the display. The system also provides for remote customer service using physical mannequins with interactive capability having two-way audio visual communication ability with the remote agent, wherein communication inputted by the remote service or sales agent is delivered to customers using the physical mannequin. A web solution integrates the virtual Digital Actor system into a business website. A smart phone solution provides the remote service to customers via an App. In another embodiment, the Digital Actor is instead displayed as a 3D hologram. The Digital Actor is also used in an e-learning solution, in a movie studio suite, and as a presenter on TV, online, or other broadcasting applications.
-
公开(公告)号:US09727874B2
公开(公告)日:2017-08-08
申请号:US15274150
申请日:2016-09-23
Applicant: Ratnakumar Navaratnam
Inventor: Ratnakumar Navaratnam
IPC: H04N7/14 , G06Q30/00 , G10L15/22 , G10L13/04 , G10L15/26 , G10L15/02 , G10L21/10 , H04N7/15 , G06F3/01 , H04N5/232 , G06F17/30 , G06Q30/02
CPC classification number: G06Q30/016 , B25J11/001 , B25J11/0015 , B25J13/00 , G06F3/017 , G06F17/30259 , G06F17/30654 , G06F17/30766 , G06Q30/02 , G06Q30/0241 , G10L13/00 , G10L13/043 , G10L15/00 , G10L15/02 , G10L15/22 , G10L15/26 , G10L15/265 , G10L21/10 , G10L2015/025 , G10L2021/105 , H04M3/5125 , H04M2203/1025 , H04N5/23203 , H04N5/23219 , H04N7/147 , H04N7/157
Abstract: A system for remote servicing of customers includes an interactive display unit at the customer location providing two-way audio/visual communication with a remote service/sales agent, wherein communication inputted by the agent is delivered to customers via a virtual Digital Actor on the display. The system also provides for remote customer service using physical mannequins with interactive capability having two-way audio visual communication ability with the remote agent, wherein communication inputted by the remote service or sales agent is delivered to customers using the physical mannequin. A web solution integrates the virtual Digital Actor system into a business website. A smart phone solution provides the remote service to customers via an App. In another embodiment, the Digital Actor is instead displayed as a 3D hologram. The Digital Actor is also used in an e-learning solution, in a movie studio suite, and as a presenter on TV, online, or other broadcasting applications.
-
-
-
-
-
-
-
-
-