-
公开(公告)号:US11984114B2
公开(公告)日:2024-05-14
申请号:US17495402
申请日:2021-10-06
Applicant: Snap Inc.
Inventor: Alan Bekker , Itamar Schen , Jackie Assa , Einav Itamar , Nave Algarici
CPC classification number: G10L15/16 , G06N3/045 , G06N3/084 , G10L15/063
Abstract: Systems and methods are provided for performing speech to intent classification. The systems and methods perform operations comprising: receiving an audio file comprising speech input; processing, by a speech recognition engine, the audio file comprising the speech input to generate an initial character-based representation of the speech input; processing, by an intent classifier, the initial character-based representation of the speech input to generate an estimated intent of the speech input; and generating, by the speech recognition engine, a textual representation of the speech input based on the estimated intent of the speech input.
-
公开(公告)号:US20230252972A1
公开(公告)日:2023-08-10
申请号:US17667128
申请日:2022-02-08
Applicant: Snap Inc.
Inventor: Liron Harazi , Jackie Assa , Alan Bekker
IPC: G10L13/08 , G10L25/18 , G10L13/047 , G06F3/0482 , G10L13/033
CPC classification number: G10L13/08 , G10L25/18 , G10L13/047 , G06F3/0482 , G10L13/033
Abstract: Systems and methods are provided for providing emotion-based text to speech. The systems and methods perform operations comprising accessing a text string; storing a plurality of embeddings associated with a plurality of speakers, a first embedding for a first speaker being associated with a first emotion and a second embedding for a second speaker of the plurality of speakers being associated with a second emotion; selecting the first speaker to speak one or more words of the text string; determining that the one or more words are associated with the second emotion; generating, based on the first embedding and the second embedding, a third embedding for the first speaker associated with the second emotion; and applying the third embedding and the text string to a vocoder to generate an audio stream comprising the one or more words being spoken by the first speaker with the second emotion.
-
公开(公告)号:US20240021195A1
公开(公告)日:2024-01-18
申请号:US17864937
申请日:2022-07-14
Applicant: Snap Inc.
Inventor: Jackie Assa , Alan Bekker , Zach Moshe
IPC: G10L15/197 , G10L15/22 , G10L15/187 , G10L15/10
CPC classification number: G10L15/197 , G10L15/22 , G10L15/187 , G10L15/10
Abstract: Systems and methods are provided for performing automated speech recognition. The systems and methods perform operations comprising: accessing a language model that includes a plurality of n-grams, each of the plurality of n-grams comprising a respective sequence of words and corresponding LM score; selecting a target word to boost in the language model; receiving a boosting factor for the target word; identifying a target n-gram in the language model that includes the target word; identifying a subset of n-grams of the plurality of n-grams that include words in a portion of the target n-gram; and adjusting the LM score of the target n-gram based on the LM scores of the subset of n-grams and the boosting factor.
-
公开(公告)号:US20230326445A1
公开(公告)日:2023-10-12
申请号:US17658807
申请日:2022-04-11
Applicant: Snap Inc.
Inventor: Guy Adam , Jackie Assa , Alan Bekker
CPC classification number: G10L13/08 , G06N20/00 , G10L15/187 , G10L15/063 , G06T13/205 , G06T13/40
Abstract: Systems and methods are provided for providing animated speech refinement. The systems and methods perform operations comprising: receiving an audio stream comprising one or more spoken words; processing the audio stream by an automated speech recognition (ASR) engine to identify base timing of one or more phonemes corresponding to the one or more spoken words; applying a machine learning model to the base of the one or more phonemes to estimate an adjustment to the base timing of the one or more phonemes.
-
公开(公告)号:US20230197064A1
公开(公告)日:2023-06-22
申请号:US17644970
申请日:2021-12-17
Applicant: Snap Inc.
Inventor: Alan Bekker , Jackie Assa , Itamar Schen , Einav Itamar
CPC classification number: G10L15/16 , G10L15/063 , G10L21/10 , G06N3/08 , G06N3/0454
Abstract: Systems and methods are provided for extracting entities from received speech. The systems and methods perform operations comprising receiving an audio file comprising speech input and processing, by a speech recognition engine, the audio file comprising the speech input to generate an initial character-based representation of the speech input. The operations further comprise processing, by an entity extractor, the initial character-based representation of the speech input to generate an estimated set of entities of the speech input. The operations further comprise generating, by the speech recognition engine, a textual representation of the speech input based on the estimated set of entities of the speech input.
-
公开(公告)号:US11983462B2
公开(公告)日:2024-05-14
申请号:US17446513
申请日:2021-08-31
Applicant: Snap Inc.
Inventor: Jackie Assa , Alan Bekker , Gilad Landau
CPC classification number: G06F3/167 , G06F3/011 , G06N3/02 , G06Q30/0643 , G10L15/16 , G10L15/22 , G10L2015/223 , G10L2015/225
Abstract: Systems and methods are provided for providing an augmented reality experience. The systems and methods perform operations comprising: generating, for display by a messaging application, an image comprising one or more augmented reality elements, the one or more augmented reality elements being associated with a configurable entity; receiving, by the messaging application, speech input from a user; determining a schema associated with the one or more augmented reality elements; causing the speech input to be processed by a speech understanding model in accordance with the schema to determine one or more configurable state entity update values; updating the configurable entity associated with the one or more augmented reality elements based on the one or more configurable state entity update values; and modifying the one or more augmented reality elements in the image based on the updated configurable entity.
-
公开(公告)号:US20240062752A1
公开(公告)日:2024-02-22
申请号:US17821431
申请日:2022-08-22
Applicant: Snap Inc.
Inventor: Jackie Assa , Alan Bekker , Zach Moshe
IPC: G10L15/197
CPC classification number: G10L15/197
Abstract: Systems and methods are provided for performing automated speech recognition. The systems and methods access a LM that includes a plurality of n-grams, each of the plurality of n-grams comprising a respective sequence of words and corresponding LM score and receive a list of words associated with a group classification, each word in the list of words being associated with a respective weight. The systems and method compute, based on the LM scores of the plurality of n-grams, a probability that a given word in the list of words associated with the group classification appears in an n-gram in the LM comprising an individual sequence of words and adds one or more new n-grams to the LM comprising one or more words in the list of words in combination with the individual sequence of words and associated with a particular LM score based on the computed probability.
-
公开(公告)号:US20230067305A1
公开(公告)日:2023-03-02
申请号:US17446513
申请日:2021-08-31
Applicant: Snap Inc.
Inventor: Jackie Assa , Alan Bekker , Gilad Landau
Abstract: Systems and methods are provided for providing an augmented reality experience. The systems and methods perform operations comprising: generating, for display by a messaging application, an image comprising one or more augmented reality elements, the one or more augmented reality elements being associated with a configurable entity; receiving, by the messaging application, speech input from a user; determining a schema associated with the one or more augmented reality elements; causing the speech input to be processed by a speech understanding model in accordance with the schema to determine one or more configurable state entity update values; updating the configurable entity associated with the one or more augmented reality elements based on the one or more configurable state entity update values; and modifying the one or more augmented reality elements in the image based on the updated configurable entity.
-
-
-
-
-
-
-