Patent search ap:("GOOGLE LLC") AND inv:"Yonghui Wu" Page 4

31.

发明申请
MULTILINGUAL SPEECH SYNTHESIS AND CROSS-LANGUAGE VOICE CLONING 审中-公开

公开(公告)号：US20200380952A1

公开(公告)日：2020-12-03

申请号：US16855042

申请日：2020-04-22

Applicant: Google LLC

Inventor： Yu Zhang , Ron J. Weiss , Byungha Chun , Yonghui Wu , Zhifeng Chen , Russell John Wyatt Skerry-Ryan , Ye Jia , Andrew M. Rosenberg , Bhuvana Ramabhadran

IPC: G10L13/047

Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.

32.

发明申请
Large-Scale Multilingual Speech Recognition With A Streaming End-To-End Model 审中-公开

公开(公告)号：US20200380215A1

公开(公告)日：2020-12-03

申请号：US16834342

申请日：2020-03-30

Applicant: Google LLC

Inventor： Anjuli Patricia Kannan , Tara N. Sainath , Yonghui Wu , Ankur Bapna , Arindrima Datta

IPC: G06F40/40 , G10L15/00

Abstract: A method of transcribing speech using a multilingual end-to-end (E2E) speech recognition model includes receiving audio data for an utterance spoken in a particular native language, obtaining a language vector identifying the particular language, and processing, using the multilingual E2E speech recognition model, the language vector and acoustic features derived from the audio data to generate a transcription for the utterance. The multilingual E2E speech recognition model includes a plurality of language-specific adaptor modules that include one or more adaptor modules specific to the particular native language and one or more other adaptor modules specific to at least one other native language different than the particular native language. The method also includes providing the transcription for output.

33.

发明授权
End-to-end text-to-speech conversion 有权

公开(公告)号：US10573293B2

公开(公告)日：2020-02-25

申请号：US16447862

申请日：2019-06-20

Applicant: Google LLC

Inventor： Samuel Bengio , Yuxuan Wang , Zongheng Yang , Zhifeng Chen , Yonghui Wu , Ioannis Agiomyrgiannakis , Ron J. Weiss , Navdeep Jaitly , Ryan M. Rifkin , Robert Andrew James Clark , Quoc V. Le , Russell J. Ryan , Ying Xiao

IPC: G10L13/027 , G10L15/16 , G10L13/08 , G06N3/08 , G10L25/18 , G10L25/30 , G10L13/04 , G06N3/04

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.

34.

发明申请
REWARD AUGMENTED MODEL TRAINING 审中-公开

公开(公告)号：US20190188566A1

公开(公告)日：2019-06-20

申请号：US16328207

申请日：2017-08-25

Applicant: GOOGLE LLC

Inventor： Michael Schuster , Samuel Bengio , Navdeep Jaitly , Zhifeng Chen , Dale Eric Schuurmans , Mohammad Norouzi , Yonghui Wu

IPC: G06N3/08 , G06N20/00

CPC classification number: G06N3/08 , G06N20/00

Abstract: A method includes obtaining data identifying a machine learning model to be trained to perform a machine learning task, the machine learning model being configured to receive an input example and to process the input example in accordance with current values of a plurality of model parameters to generate a model output for the input example; obtaining initial training data for training the machine learning model, the initial training data comprising a plurality of training examples and, for each training example, a ground truth output that should be generated by the machine learning model by processing the training example; generating modified training data from the initial training data; and training the machine learning model on the modified training data.

35.

发明申请
GENERATING DESCRIPTIVE TEXT FOR IMAGES 审中-公开

公开(公告)号：US20180210895A1

公开(公告)日：2018-07-26

申请号：US15926726

申请日：2018-03-20

Applicant: Google LLC

Inventor： Yonghui Wu , Michael E. Flaster , Randall G. Keller , Paul Haahr

IPC: G06F17/30 , G06F17/21 , G06F3/0484 , G06F17/27

CPC classification number: G06F17/30247 , G06F3/04842 , G06F17/211 , G06F17/212 , G06F17/2785 , G06F17/30011 , G06F17/30047 , G06F17/30253 , G06F17/30289 , G06F17/30876

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating descriptive text for images. In one aspect, a method includes identifying a set of seed descriptors for an image in a document that is hosted on a website. For each seed descriptor, structure information is generated that specifies a structure of the document with respect to the image and the seed descriptor. One or more templates are generated for each seed descriptor using the structure information for the seed descriptor. Each template can include image location information, document structure information, image feature information, and a generative rule that generates descriptive text for other images in other documents. Descriptive text for other images is generated using the templates and the other documents. The descriptive text is associated with the images.

36.

发明申请
ASYNCHRONOUS DISTRIBUTED DATA FLOW FOR MACHINE LEARNING WORKLOADS 有权

公开(公告)号：US20250053444A1

公开(公告)日：2025-02-13

申请号：US18814371

申请日：2024-08-23

Applicant: Google LLC

Inventor： Jeffrey Adgate Dean , Sudip Roy , Michael Acheson Isard , Aakanksha Chowdhery , Brennan Saeta , Chandramohan Amyangot Thekkath , Daniel William Hurt , Hyeontaek Lim , Laurent El Shafey , Parker Edward Schuh , Paul Ronald Barham , Ruoming Pang , Ryan Sepassi , Sanjay Ghemawat , Yonghui Wu

IPC: G06F9/48 , G06N3/063 , G06N3/08

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for distributing machine learning workloads, e.g., computations for training a neural network or computing an inference using a neural network, across multiple hardware accelerators. One of the systems comprises a plurality of accelerator islands, each hardware accelerator island comprising a respective plurality of hardware devices that include a plurality of hardware accelerators and a corresponding host for each of the plurality of hardware accelerators; and a respective scheduler for each of the accelerator islands that is configured to schedule workloads across the plurality of accelerators and corresponding hosts in the accelerator island, wherein the system is configured to: receive data representing a machine learning workload; and assign a respective portion of the machine learning workload to each of the plurality of accelerator islands for scheduling by the respective scheduler for the accelerator island.

37.

发明申请
SPEECH RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS 有权

公开(公告)号：US20240420686A1

公开(公告)日：2024-12-19

申请号：US18815200

申请日：2024-08-26

Applicant: Google LLC

Inventor： Rohit Prakash Prabhavalkar , Zhifeng Chen , Bo Li , Chung-Cheng Chiu , Kanury Kanishka Rao , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly , Michiel A. U. Bacchiani , Tara N. Sainath , Jan Kazimierz Chorowski , Anjuli Patricia Kannan , Ekaterina Gonina , Patrick An Phu Nguyen

IPC: G10L15/16 , G06N3/08 , G10L15/02 , G10L15/06 , G10L15/22 , G10L15/26 , G10L25/30

Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

38.

发明申请
Vector-Quantized Image Modeling 有权

公开(公告)号：US20240404238A1

公开(公告)日：2024-12-05

申请号：US18698997

申请日：2022-10-05

Applicant: Google LLC

Inventor： Jiahui Yu , Vijay Vasudevan , Alexander Yeong-Shiuh Ku , Yonghui Wu , Jason Michael Baldridge , Yuanzhong Xu , Jing Yu Koh , Thang Minh Luong , Gunjan Baid , Zirui Wang , Han Zhang , Xin Li

IPC: G06V10/28 , G06F40/284 , G06V10/764 , G06V10/766 , G06V10/82

Abstract: Systems and methods are provided for vector-quantized image modeling using vision transformers and improved codebook handling. In particular, the present disclosure provides a Vector-quantized Image Modeling (VIM) approach that involves pre-training a machine learning model (e.g., Transformer model) to predict rasterized image tokens autoregressively. The discrete image tokens can be encoded from a learned Vision-Transformer-based VQGAN (example implementations of which can be referred to as ViT-VQGAN). The present disclosure proposes multiple improvements over vanilla VQGAN from architecture to codebook learning, yielding better efficiency and reconstruction fidelity. The improved ViT-VQGAN further improves vector-quantized image modeling tasks, including unconditional image generation, conditioned image generation (e.g., class-conditioned image generation), and unsupervised representation learning.

39.

发明公开
Phonemes And Graphemes for Neural Text-to-Speech 审中-公开

公开(公告)号：US20240339106A1

公开(公告)日：2024-10-10

申请号：US18746809

申请日：2024-06-18

Applicant: Google LLC

Inventor： Ye Jia , Byungha Chun , Yu Zhang , Jonathan Shen , Yonghui Wu

IPC: G10L13/08 , G06F40/263 , G06F40/279 , G06N3/08 , G10L13/047

CPC classification number: G10L13/086 , G06F40/263 , G06F40/279 , G06N3/08 , G10L13/047

Abstract: A method includes receiving a text input including a sequence of words represented as an input encoder embedding. The input encoder embedding includes a plurality of tokens, with the plurality of tokens including a first set of grapheme tokens representing the text input as respective graphemes and a second set of phoneme tokens representing the text input as respective phonemes. The method also includes, for each respective phoneme token of the second set of phoneme tokens: identifying a respective word of the sequence of words corresponding to the respective phoneme token and determining a respective grapheme token representing the respective word of the sequence of words corresponding to the respective phoneme token. The method also includes generating an output encoder embedding based on a relationship between each respective phoneme token and the corresponding grapheme token determined to represent a same respective word as the respective phoneme token.

40.

发明授权
Multilingual speech synthesis and cross-language voice cloning 有权

公开(公告)号：US12087273B2

公开(公告)日：2024-09-10

申请号：US18161217

申请日：2023-01-30

Applicant: Google LLC

Inventor： Yu Zhang , Ron J. Weiss , Byungha Chun , Yonghui Wu , Zhifeng Chen , Russell John Wyatt Skerry-Ryan , Ye Jia , Andrew M. Rosenberg , Bhuvana Ramabhadran

IPC: G10L21/00 , G10L13/00 , G10L13/047

CPC classification number: G10L13/047

Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification