Learning to Segment via Cut-and-Paste

    公开(公告)号:US20210256707A1

    公开(公告)日:2021-08-19

    申请号:US17252663

    申请日:2019-07-10

    Applicant: Google LLC

    Abstract: Example aspects of the present disclosure are directed to systems and methods that enable weakly-supervised learning of instance segmentation by applying a cut-and-paste technique to training of a generator model included in a generative adversarial network. In particular, the present disclosure provides a weakly-supervised approach to object instance segmentation. In some implementations, starting with known or predicted object bounding boxes, a generator model can learn to generate object masks by playing a game of cut-and-paste in an adversarial learning setup.

    Robust Direct Speech-to-Speech Translation
    6.
    发明公开

    公开(公告)号:US20240273311A1

    公开(公告)日:2024-08-15

    申请号:US18626745

    申请日:2024-04-04

    Applicant: Google LLC

    CPC classification number: G06F40/58 G10L13/02 G10L13/10 G10L19/16

    Abstract: A direct speech-to-speech translation (S2ST) model includes an encoder configured to receive an input speech representation that to an utterance spoken by a source speaker in a first language and encode the input speech representation into a hidden feature representation. The S2ST model also includes an attention module configured to generate a context vector that attends to the hidden representation encoded by the encoder. The S2ST model also includes a decoder configured to receive the context vector generated by the attention module and predict a phoneme representation that corresponds to a translation of the utterance in a second different language. The S2ST model also includes a synthesizer configured to receive the context vector and the phoneme representation and generate a translated synthesized speech representation that corresponds to a translation of the utterance spoken in the different second language.

    Robust direct speech-to-speech translation

    公开(公告)号:US11960852B2

    公开(公告)日:2024-04-16

    申请号:US17644351

    申请日:2021-12-15

    Applicant: Google LLC

    CPC classification number: G06F40/58 G10L13/02 G10L13/10 G10L19/16

    Abstract: A direct speech-to-speech translation (S2ST) model includes an encoder configured to receive an input speech representation that to an utterance spoken by a source speaker in a first language and encode the input speech representation into a hidden feature representation. The S2ST model also includes an attention module configured to generate a context vector that attends to the hidden representation encoded by the encoder. The S2ST model also includes a decoder configured to receive the context vector generated by the attention module and predict a phoneme representation that corresponds to a translation of the utterance in a second different language. The S2ST model also includes a synthesizer configured to receive the context vector and the phoneme representation and generate a translated synthesized speech representation that corresponds to a translation of the utterance spoken in the different second language.

    Robust Direct Speech-to-Speech Translation

    公开(公告)号:US20230013777A1

    公开(公告)日:2023-01-19

    申请号:US17644351

    申请日:2021-12-15

    Applicant: Google LLC

    Abstract: A direct speech-to-speech translation (S2ST) model includes an encoder configured to receive an input speech representation that to an utterance spoken by a source speaker in a first language and encode the input speech representation into a hidden feature representation. The S2ST model also includes an attention module configured to generate a context vector that attends to the hidden representation encoded by the encoder. The S2ST model also includes a decoder configured to receive the context vector generated by the attention module and predict a phoneme representation that corresponds to a translation of the utterance in a second different language. The S2ST model also includes a synthesizer configured to receive the context vector and the phoneme representation and generate a translated synthesized speech representation that corresponds to a translation of the utterance spoken in the different second language.

Patent Agency Ranking