SPEECH TO ENTITY
    1.
    发明公开
    SPEECH TO ENTITY 审中-公开

    公开(公告)号:US20230197064A1

    公开(公告)日:2023-06-22

    申请号:US17644970

    申请日:2021-12-17

    Applicant: Snap Inc.

    CPC classification number: G10L15/16 G10L15/063 G10L21/10 G06N3/08 G06N3/0454

    Abstract: Systems and methods are provided for extracting entities from received speech. The systems and methods perform operations comprising receiving an audio file comprising speech input and processing, by a speech recognition engine, the audio file comprising the speech input to generate an initial character-based representation of the speech input. The operations further comprise processing, by an entity extractor, the initial character-based representation of the speech input to generate an estimated set of entities of the speech input. The operations further comprise generating, by the speech recognition engine, a textual representation of the speech input based on the estimated set of entities of the speech input.

    Speech to intent
    2.
    发明授权

    公开(公告)号:US11984114B2

    公开(公告)日:2024-05-14

    申请号:US17495402

    申请日:2021-10-06

    Applicant: Snap Inc.

    CPC classification number: G10L15/16 G06N3/045 G06N3/084 G10L15/063

    Abstract: Systems and methods are provided for performing speech to intent classification. The systems and methods perform operations comprising: receiving an audio file comprising speech input; processing, by a speech recognition engine, the audio file comprising the speech input to generate an initial character-based representation of the speech input; processing, by an intent classifier, the initial character-based representation of the speech input to generate an estimated intent of the speech input; and generating, by the speech recognition engine, a textual representation of the speech input based on the estimated intent of the speech input.

    Grouping similar words in a language model

    公开(公告)号:US12236946B2

    公开(公告)日:2025-02-25

    申请号:US17821431

    申请日:2022-08-22

    Applicant: Snap Inc.

    Abstract: Systems and methods are provided for performing automated speech recognition. The systems and methods access a LM that includes a plurality of n-grams, each of the plurality of n-grams comprising a respective sequence of words and corresponding LM score and receive a list of words associated with a group classification, each word in the list of words being associated with a respective weight. The systems and method compute, based on the LM scores of the plurality of n-grams, a probability that a given word in the list of words associated with the group classification appears in an n-gram in the LM comprising an individual sequence of words and adds one or more new n-grams to the LM comprising one or more words in the list of words in combination with the individual sequence of words and associated with a particular LM score based on the computed probability.

    EMOTION-BASED TEXT TO SPEECH
    4.
    发明公开

    公开(公告)号:US20230252972A1

    公开(公告)日:2023-08-10

    申请号:US17667128

    申请日:2022-02-08

    Applicant: Snap Inc.

    CPC classification number: G10L13/08 G10L25/18 G10L13/047 G06F3/0482 G10L13/033

    Abstract: Systems and methods are provided for providing emotion-based text to speech. The systems and methods perform operations comprising accessing a text string; storing a plurality of embeddings associated with a plurality of speakers, a first embedding for a first speaker being associated with a first emotion and a second embedding for a second speaker of the plurality of speakers being associated with a second emotion; selecting the first speaker to speak one or more words of the text string; determining that the one or more words are associated with the second emotion; generating, based on the first embedding and the second embedding, a third embedding for the first speaker associated with the second emotion; and applying the third embedding and the text string to a vocoder to generate an audio stream comprising the one or more words being spoken by the first speaker with the second emotion.

    Conversation guided augmented reality experience

    公开(公告)号:US11983462B2

    公开(公告)日:2024-05-14

    申请号:US17446513

    申请日:2021-08-31

    Applicant: Snap Inc.

    Abstract: Systems and methods are provided for providing an augmented reality experience. The systems and methods perform operations comprising: generating, for display by a messaging application, an image comprising one or more augmented reality elements, the one or more augmented reality elements being associated with a configurable entity; receiving, by the messaging application, speech input from a user; determining a schema associated with the one or more augmented reality elements; causing the speech input to be processed by a speech understanding model in accordance with the schema to determine one or more configurable state entity update values; updating the configurable entity associated with the one or more augmented reality elements based on the one or more configurable state entity update values; and modifying the one or more augmented reality elements in the image based on the updated configurable entity.

    GROUPING SIMILAR WORDS IN A LANGUAGE MODEL
    6.
    发明公开

    公开(公告)号:US20240062752A1

    公开(公告)日:2024-02-22

    申请号:US17821431

    申请日:2022-08-22

    Applicant: Snap Inc.

    CPC classification number: G10L15/197

    Abstract: Systems and methods are provided for performing automated speech recognition. The systems and methods access a LM that includes a plurality of n-grams, each of the plurality of n-grams comprising a respective sequence of words and corresponding LM score and receive a list of words associated with a group classification, each word in the list of words being associated with a respective weight. The systems and method compute, based on the LM scores of the plurality of n-grams, a probability that a given word in the list of words associated with the group classification appears in an n-gram in the LM comprising an individual sequence of words and adds one or more new n-grams to the LM comprising one or more words in the list of words in combination with the individual sequence of words and associated with a particular LM score based on the computed probability.

    CONVERSATION GUIDED AUGMENTED REALITY EXPERIENCE

    公开(公告)号:US20230067305A1

    公开(公告)日:2023-03-02

    申请号:US17446513

    申请日:2021-08-31

    Applicant: Snap Inc.

    Abstract: Systems and methods are provided for providing an augmented reality experience. The systems and methods perform operations comprising: generating, for display by a messaging application, an image comprising one or more augmented reality elements, the one or more augmented reality elements being associated with a configurable entity; receiving, by the messaging application, speech input from a user; determining a schema associated with the one or more augmented reality elements; causing the speech input to be processed by a speech understanding model in accordance with the schema to determine one or more configurable state entity update values; updating the configurable entity associated with the one or more augmented reality elements based on the one or more configurable state entity update values; and modifying the one or more augmented reality elements in the image based on the updated configurable entity.

    EMOTION-BASED TEXT TO SPEECH
    8.
    发明申请

    公开(公告)号:US20250029595A1

    公开(公告)日:2025-01-23

    申请号:US18906853

    申请日:2024-10-04

    Applicant: Snap Inc.

    Abstract: Systems and methods are provided for providing emotion-based text to speech. The systems and methods perform operations comprising accessing a text string; storing a plurality of embeddings associated with a plurality of speakers, a first embedding for a first speaker being associated with a first emotion and a second embedding for a second speaker of the plurality of speakers being associated with a second emotion; selecting the first speaker to speak one or more words of the text string; determining that the one or more words are associated with the second emotion; generating, based on the first embedding and the second embedding, a third embedding for the first speaker associated with the second emotion; and applying the third embedding and the text string to a vocoder to generate an audio stream comprising the one or more words being spoken by the first speaker with the second emotion.

    Emotion-based text to speech
    9.
    发明授权

    公开(公告)号:US12142257B2

    公开(公告)日:2024-11-12

    申请号:US17667128

    申请日:2022-02-08

    Applicant: Snap Inc.

    Abstract: Systems and methods are provided for providing emotion-based text to speech. The systems and methods perform operations comprising accessing a text string; storing a plurality of embeddings associated with a plurality of speakers, a first embedding for a first speaker being associated with a first emotion and a second embedding for a second speaker of the plurality of speakers being associated with a second emotion; selecting the first speaker to speak one or more words of the text string; determining that the one or more words are associated with the second emotion; generating, based on the first embedding and the second embedding, a third embedding for the first speaker associated with the second emotion; and applying the third embedding and the text string to a vocoder to generate an audio stream comprising the one or more words being spoken by the first speaker with the second emotion.

    BOOSTING WORDS IN AUTOMATED SPEECH RECOGNITION

    公开(公告)号:US20240021195A1

    公开(公告)日:2024-01-18

    申请号:US17864937

    申请日:2022-07-14

    Applicant: Snap Inc.

    CPC classification number: G10L15/197 G10L15/22 G10L15/187 G10L15/10

    Abstract: Systems and methods are provided for performing automated speech recognition. The systems and methods perform operations comprising: accessing a language model that includes a plurality of n-grams, each of the plurality of n-grams comprising a respective sequence of words and corresponding LM score; selecting a target word to boost in the language model; receiving a boosting factor for the target word; identifying a target n-gram in the language model that includes the target word; identifying a subset of n-grams of the plurality of n-grams that include words in a portion of the target n-gram; and adjusting the LM score of the target n-gram based on the LM scores of the subset of n-grams and the boosting factor.

Patent Agency Ranking