-
公开(公告)号:US20210256707A1
公开(公告)日:2021-08-19
申请号:US17252663
申请日:2019-07-10
Applicant: Google LLC
Inventor: Matthew Alun Brown , Jonathan Chung-Kuan Huang , Tal Remez
Abstract: Example aspects of the present disclosure are directed to systems and methods that enable weakly-supervised learning of instance segmentation by applying a cut-and-paste technique to training of a generator model included in a generative adversarial network. In particular, the present disclosure provides a weakly-supervised approach to object instance segmentation. In some implementations, starting with known or predicted object bounding boxes, a generator model can learn to generate object masks by playing a game of cut-and-paste in an adversarial learning setup.
-
公开(公告)号:US12073844B2
公开(公告)日:2024-08-27
申请号:US17601042
申请日:2020-10-01
Applicant: Google LLC
Inventor: Anatoly Efros , Noam Etzion-Rosenberg , Tal Remez , Oran Lang , Inbar Mosseri , Israel Or Weinstein , Benjamin Schlesinger , Michael Rubinstein , Ariel Ephrat , Yukun Zhu , Stella Laurenzo , Amit Pitaru , Yossi Matias
IPC: G10L21/0208 , G10L17/00 , G10L21/0272 , G10L25/57
CPC classification number: G10L21/0208 , G10L17/00 , G10L21/0272 , G10L25/57 , G10L2021/02087
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: receiving, by a user device, a first indication of one or more first speakers visible in a current view recorded by a camera of the user device, in response, generating a respective isolated speech signal for each of the one or more first speakers that isolates speech of the first speaker in the current view and sending the isolated speech signals for each of the one or more first speakers to a listening device operatively coupled to the user device, receiving, by the user device, a second indication of one or more second speakers visible in the current view recorded by the camera of the user device, and in response generating and sending a respective isolated speech signal for each of the one or more second speakers to the listening device.
-
公开(公告)号:US11853892B2
公开(公告)日:2023-12-26
申请号:US17252663
申请日:2019-07-10
Applicant: Google LLC
Inventor: Matthew Alun Brown , Jonathan Chung-Kuan Huang , Tal Remez
CPC classification number: G06N3/084 , G06N3/045 , G06T7/11 , G06T7/194 , G06T11/20 , G06V10/764 , G06V10/82 , G06T2207/20081 , G06T2207/20084 , G06T2210/12
Abstract: Example aspects of the present disclosure are directed to systems and methods that enable weakly-supervised learning of instance segmentation by applying a cut-and-paste technique to training of a generator model included in a generative adversarial network. In particular, the present disclosure provides a weakly-supervised approach to object instance segmentation. In some implementations, starting with known or predicted object bounding boxes, a generator model can learn to generate object masks by playing a game of cut-and-paste in an adversarial learning setup.
-
公开(公告)号:US20230267942A1
公开(公告)日:2023-08-24
申请号:US17601042
申请日:2020-10-01
Applicant: Google LLC
Inventor: Anatoly Efros , Noam Etzion-Rosenberg , Tal Remez , Oran Lang , Inbar Mosseri , Israel Or Weinstein , Benjamin Schlesinger , Michael Rubinstein , Ariel Ephrat , Yukun Zhu , Stella Laurenzo , Amit Pitaru , Yossi Matias
IPC: G10L21/0208 , G10L25/57
CPC classification number: G10L21/0208 , G10L25/57 , G10L2021/02087
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: receiving, by a user device, a first indication of one or more first speakers visible in a current view recorded by a camera of the user device, in response, generating a respective isolated speech signal for each of the one or more first speakers that isolates speech of the first speaker in the current view and sending the isolated speech signals for each of the one or more first speakers to a listening device operatively coupled to the user device, receiving, by the user device, a second indication of one or more second speakers visible in the current view recorded by the camera of the user device, and in response generating and sending a respective isolated speech signal for each of the one or more second speakers to the listening device.
-
公开(公告)号:US20240428816A1
公开(公告)日:2024-12-26
申请号:US18797400
申请日:2024-08-07
Applicant: Google LLC
Inventor: Anatoly Efros , Noam Etzion-Rosenberg , Tal Remez , Oran Lang , Inbar Mosseri , Israel Or Weinstein , Benjamin Schlesinger , Michael Rubinstein , Ariel Ephrat , Yukun Zhu , Stella Laurenzo , Amit Pitaru , Yossi Matias
IPC: G10L21/0208 , G10L17/00 , G10L21/0272 , G10L25/57
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: receiving, by a user device, a first indication of one or more first speakers visible in a current view recorded by a camera of the user device, in response, generating a respective isolated speech signal for each of the one or more first speakers that isolates speech of the first speaker in the current view and sending the isolated speech signals for each of the one or more first speakers to a listening device operatively coupled to the user device, receiving, by the user device, a second indication of one or more second speakers visible in the current view recorded by the camera of the user device, and in response generating and sending a respective isolated speech signal for each of the one or more second speakers to the listening device.
-
公开(公告)号:US20240273311A1
公开(公告)日:2024-08-15
申请号:US18626745
申请日:2024-04-04
Applicant: Google LLC
Inventor: Ye Jia , Michelle Tadmor Ramanovich , Tal Remez , Roi Pomerantz
Abstract: A direct speech-to-speech translation (S2ST) model includes an encoder configured to receive an input speech representation that to an utterance spoken by a source speaker in a first language and encode the input speech representation into a hidden feature representation. The S2ST model also includes an attention module configured to generate a context vector that attends to the hidden representation encoded by the encoder. The S2ST model also includes a decoder configured to receive the context vector generated by the attention module and predict a phoneme representation that corresponds to a translation of the utterance in a second different language. The S2ST model also includes a synthesizer configured to receive the context vector and the phoneme representation and generate a translated synthesized speech representation that corresponds to a translation of the utterance spoken in the different second language.
-
公开(公告)号:US11960852B2
公开(公告)日:2024-04-16
申请号:US17644351
申请日:2021-12-15
Applicant: Google LLC
Inventor: Ye Jia , Michelle Tadmor Ramanovich , Tal Remez , Roi Pomerantz
Abstract: A direct speech-to-speech translation (S2ST) model includes an encoder configured to receive an input speech representation that to an utterance spoken by a source speaker in a first language and encode the input speech representation into a hidden feature representation. The S2ST model also includes an attention module configured to generate a context vector that attends to the hidden representation encoded by the encoder. The S2ST model also includes a decoder configured to receive the context vector generated by the attention module and predict a phoneme representation that corresponds to a translation of the utterance in a second different language. The S2ST model also includes a synthesizer configured to receive the context vector and the phoneme representation and generate a translated synthesized speech representation that corresponds to a translation of the utterance spoken in the different second language.
-
公开(公告)号:US20230013777A1
公开(公告)日:2023-01-19
申请号:US17644351
申请日:2021-12-15
Applicant: Google LLC
Inventor: Ye Jia , Michelle Tadmor Ramanovich , Tal Remez , Roi Pomerantz
Abstract: A direct speech-to-speech translation (S2ST) model includes an encoder configured to receive an input speech representation that to an utterance spoken by a source speaker in a first language and encode the input speech representation into a hidden feature representation. The S2ST model also includes an attention module configured to generate a context vector that attends to the hidden representation encoded by the encoder. The S2ST model also includes a decoder configured to receive the context vector generated by the attention module and predict a phoneme representation that corresponds to a translation of the utterance in a second different language. The S2ST model also includes a synthesizer configured to receive the context vector and the phoneme representation and generate a translated synthesized speech representation that corresponds to a translation of the utterance spoken in the different second language.
-
-
-
-
-
-
-