-
公开(公告)号:US12236946B2
公开(公告)日:2025-02-25
申请号:US17821431
申请日:2022-08-22
Applicant: Snap Inc.
Inventor: Jacob Assa , Alan Bekker , Zach Moshe
IPC: G10L15/197
Abstract: Systems and methods are provided for performing automated speech recognition. The systems and methods access a LM that includes a plurality of n-grams, each of the plurality of n-grams comprising a respective sequence of words and corresponding LM score and receive a list of words associated with a group classification, each word in the list of words being associated with a respective weight. The systems and method compute, based on the LM scores of the plurality of n-grams, a probability that a given word in the list of words associated with the group classification appears in an n-gram in the LM comprising an individual sequence of words and adds one or more new n-grams to the LM comprising one or more words in the list of words in combination with the individual sequence of words and associated with a particular LM score based on the computed probability.
-
公开(公告)号:US20250029595A1
公开(公告)日:2025-01-23
申请号:US18906853
申请日:2024-10-04
Applicant: Snap Inc.
Inventor: Liron Harazi , Jacob Assa , Alan Bekker
IPC: G10L13/08 , G06F3/0482 , G10L13/033 , G10L13/047 , G10L25/18
Abstract: Systems and methods are provided for providing emotion-based text to speech. The systems and methods perform operations comprising accessing a text string; storing a plurality of embeddings associated with a plurality of speakers, a first embedding for a first speaker being associated with a first emotion and a second embedding for a second speaker of the plurality of speakers being associated with a second emotion; selecting the first speaker to speak one or more words of the text string; determining that the one or more words are associated with the second emotion; generating, based on the first embedding and the second embedding, a third embedding for the first speaker associated with the second emotion; and applying the third embedding and the text string to a vocoder to generate an audio stream comprising the one or more words being spoken by the first speaker with the second emotion.
-
公开(公告)号:US12142257B2
公开(公告)日:2024-11-12
申请号:US17667128
申请日:2022-02-08
Applicant: Snap Inc.
Inventor: Liron Harazi , Jacob Assa , Alan Bekker
IPC: G10L13/08 , G06F3/0482 , G10L13/033 , G10L13/047 , G10L25/18
Abstract: Systems and methods are provided for providing emotion-based text to speech. The systems and methods perform operations comprising accessing a text string; storing a plurality of embeddings associated with a plurality of speakers, a first embedding for a first speaker being associated with a first emotion and a second embedding for a second speaker of the plurality of speakers being associated with a second emotion; selecting the first speaker to speak one or more words of the text string; determining that the one or more words are associated with the second emotion; generating, based on the first embedding and the second embedding, a third embedding for the first speaker associated with the second emotion; and applying the third embedding and the text string to a vocoder to generate an audio stream comprising the one or more words being spoken by the first speaker with the second emotion.
-
-