Transducer-Based Streaming Deliberation for Cascaded Encoders

    公开(公告)号:US20240428786A1

    公开(公告)日:2024-12-26

    申请号:US18826655

    申请日:2024-09-06

    Applicant: Google LLC

    Abstract: A method includes receiving a sequence of acoustic frames and generating, by a first encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by a first pass transducer decoder, a first pass speech recognition hypothesis for a corresponding first higher order feature representation and generating, by a text encoder, a text encoding for a corresponding first pass speech recognition hypothesis. The method also includes generating, by a second encoder, a second higher order feature representation for a corresponding first higher order feature representation. The method also includes generating, by a second pass transducer decoder, a second pass speech recognition hypothesis using a corresponding second higher order feature representation and a corresponding text encoding.

    Unsupervised Learning of Disentangled Speech Content and Style Representation

    公开(公告)号:US20240312449A1

    公开(公告)日:2024-09-19

    申请号:US18676743

    申请日:2024-05-29

    Applicant: Google LLC

    CPC classification number: G10L13/027 G10L21/0308

    Abstract: A linguistic content and speaking style disentanglement model includes a content encoder, a style encoder, and a decoder. The content encoder is configured to receive input speech as input and generate a latent representation of linguistic content for the input speech output. The content encoder is trained to disentangle speaking style information from the latent representation of linguistic content. The style encoder is configured to receive the input speech as input and generate a latent representation of speaking style for the input speech as output. The style encoder is trained to disentangle linguistic content information from the latent representation of speaking style. The decoder is configured to generate output speech based on the latent representation of linguistic content for the input speech and the latent representation of speaking style for the same or different input speech.

    Unsupervised learning of disentangled speech content and style representation

    公开(公告)号:US12027151B2

    公开(公告)日:2024-07-02

    申请号:US17455667

    申请日:2021-11-18

    Applicant: Google LLC

    CPC classification number: G10L13/027 G10L21/0308

    Abstract: A linguistic content and speaking style disentanglement model includes a content encoder, a style encoder, and a decoder. The content encoder is configured to receive input speech as input and generate a latent representation of linguistic content for the input speech output. The content encoder is trained to disentangle speaking style information from the latent representation of linguistic content. The style encoder is configured to receive the input speech as input and generate a latent representation of speaking style for the input speech as output. The style encoder is trained to disentangle linguistic content information from the latent representation of speaking style. The decoder is configured to generate output speech based on the latent representation of linguistic content for the input speech and the latent representation of speaking style for the same or different input speech.

Patent Agency Ranking