SYSTEMS AND METHODS FOR ANY TO ANY VOICE CONVERSION

    公开(公告)号:US20240339122A1

    公开(公告)日:2024-10-10

    申请号:US18608476

    申请日:2024-03-18

    CPC classification number: G10L21/007 G10L15/063 G10L15/08 G10L2015/0635

    Abstract: Embodiments described herein provide systems and methods for any to any voice conversion. A system receives, via a data interface, a source utterance of a first style and a target utterance of a second style. The system generates, via a first encoder, a vector representation of the target utterance. The system generates, via a second encoder, a vector representation of the source utterance. The system generates, via a filter generator, a generated filter based on the vector representation of the target utterance. The system generates, via a decoder, a generated utterance based on the vector representation of the source utterance and the generated filter.

    System and Method for Controlling an Entity
    7.
    发明公开

    公开(公告)号:US20240069501A1

    公开(公告)日:2024-02-29

    申请号:US17823387

    申请日:2022-08-30

    Abstract: A controller for controlling an entity is provided. The controller comprises a memory to store a hierarchical multimodal reinforcement learning (RL) neural network, and a processor. The hierarchical multimodal RL neural network includes a first level controller and two second level controllers. Each of the second level controllers comprise a first sub level controller relating to a first modality and a second sub level controller relating to a second modality. The first modality is different from the second modality. The processor is configured to select one of the two second level controllers to perform a first sub-task relating to a task, using the first level controller, based on input data and a state of the hierarchical multimodal RL neural network. The selected second level controller is configured to determine a set of control actions to perform the first sub-task, and control the entity based on the set of control actions.

    ACOUSTIC MODEL TRAINING USING CORRECTED TERMS

    公开(公告)号:US20230274729A1

    公开(公告)日:2023-08-31

    申请号:US18312587

    申请日:2023-05-04

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for speech recognition. One of the methods includes receiving first audio data corresponding to an utterance; obtaining a first transcription of the first audio data; receiving data indicating (i) a selection of one or more terms of the first transcription and (ii) one or more of replacement terms; determining that one or more of the replacement terms are classified as a correction of one or more of the selected terms; in response to determining that the one or more of the replacement terms are classified as a correction of the one or more of the selected terms, obtaining a first portion of the first audio data that corresponds to one or more terms of the first transcription; and using the first portion of the first audio data that is associated with the one or more terms of the first transcription to train an acoustic model for recognizing the one or more of the replacement terms.

Patent Agency Ranking