-
公开(公告)号:US11774596B2
公开(公告)日:2023-10-03
申请号:US17901224
申请日:2022-09-01
Applicant: Google LLC
Inventor: Jonathon Shlens , Vijay Vasudevan , Jiquan Ngiam , Wei Han , Zhifeng Chen , Brandon Chauloon Yang , Benjamin James Caine , Zhengdong Zhang , Christoph Sprunk , Ouais Alsharif , Junhua Mao , Chen Wu
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing data generated by a sensing system that rotationally senses an environment. In one aspect, a method comprises partitioning a predetermined period of time into a plurality of sub-periods, wherein the predetermined period of time is a period of time for which data generated by the sensing system constitutes a complete rotational sensing of the environment; for each sub-period: receiving current data generated by the sensing system during the sub-period and characterizing a respective partial scene of the environment; processing the current data using an object detection neural network to generate a current object detection output that is specific to the respective partial scene of the environment.
-
公开(公告)号:US20220130374A1
公开(公告)日:2022-04-28
申请号:US17572238
申请日:2022-01-10
Applicant: Google LLC
Inventor: Zhifeng Chen , Bo Li , Eugene Weinstein , Yonghui Wu , Pedro J. Moreno Mengibar , Ron J. Weiss , Khe Chai Sim , Tara N. Sainath , Patrick An Phu Nguyen
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer-readable media, for speech recognition using multi-dialect and multilingual models. In some implementations, audio data indicating audio characteristics of an utterance is received. Input features determined based on the audio data are provided to a speech recognition model that has been trained to output score indicating the likelihood of linguistic units for each of multiple different language or dialects. The speech recognition model can be one that has been trained using cluster adaptive training. Output that the speech recognition model generated in response to receiving the input features determined based on the audio data is received. A transcription of the utterance generated based on the output of the speech recognition model is provided.
-
公开(公告)号:US20220121945A1
公开(公告)日:2022-04-21
申请号:US17567740
申请日:2022-01-03
Applicant: Google LLC
Inventor: Zhifeng Chen , Yanping Huang , Youlong Cheng , HyoukJoong Lee , Dehao Chen , Jiquan Ngiam
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training giant neural networks. One of the methods includes obtaining data specifying a partitioning of the neural network into N composite layers that form a sequence of composite layers, wherein each composite layer comprises a distinct plurality of layers from the multiple network layers of the neural network; obtaining data assigning each of the N composite layers to one or more computing devices from a set of N computing devices; partitioning a mini-batch of training examples into a plurality of micro-batches; and training the neural network, comprising: performing a forward pass through the neural network until output activations have been computed for each micro-batch for a final composite layer in the sequence, and performing a backward pass through the neural network until output gradients have been computed for each micro-batch for the first composite layer in the sequence.
-
公开(公告)号:US11145293B2
公开(公告)日:2021-10-12
申请号:US16516390
申请日:2019-07-19
Applicant: Google LLC
Inventor: Rohit Prakash Prabhavalkar , Zhifeng Chen , Bo Li , Chung-Cheng Chiu , Kanury Kanishka Rao , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly , Michiel A. U. Bacchiani , Tara N. Sainath , Jan Kazimierz Chorowski , Anjuli Patricia Kannan , Ekaterina Gonina , Patrick An Phu Nguyen
Abstract: Methods, systems, and apparatus, including computer-readable media, for performing speech recognition using sequence-to-sequence models. An automated speech recognition (ASR) system receives audio data for an utterance and provides features indicative of acoustic characteristics of the utterance as input to an encoder. The system processes an output of the encoder using an attender to generate a context vector and generates speech recognition scores using the context vector and a decoder trained using a training process that selects at least one input to the decoder with a predetermined probability. An input to the decoder during training is selected between input data based on a known value for an element in a training example, and input data based on an output of the decoder for the element in the training example. A transcription is generated for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.
-
公开(公告)号:US11138392B2
公开(公告)日:2021-10-05
申请号:US16521780
申请日:2019-07-25
Applicant: Google LLC
Inventor: Zhifeng Chen , Macduff Richard Hughes , Yonghui Wu , Michael Schuster , Xu Chen , Llion Owen Jones , Niki J. Parmar , George Foster , Orhan Firat , Ankur Bapna , Wolfgang Macherey , Melvin Jose Johnson Premkumar
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for machine translation using neural networks. In some implementations, a text in one language is translated into a second language using a neural network model. The model can include an encoder neural network comprising a plurality of bidirectional recurrent neural network layers. The encoding vectors are processed using a multi-headed attention module configured to generate multiple attention context vectors for each encoding vector. A decoder neural network generates a sequence of decoder output vectors using the attention context vectors. The decoder output vectors can represent distributions over various language elements of the second language, allowing a translation of the text into the second language to be determined based on the sequence of decoder output vectors.
-
公开(公告)号:US10971170B2
公开(公告)日:2021-04-06
申请号:US16058640
申请日:2018-08-08
Applicant: Google LLC
Inventor: Yonghui Wu , Jonathan Shen , Ruoming Pang , Ron J. Weiss , Michael Schuster , Navdeep Jaitly , Zongheng Yang , Zhifeng Chen , Yu Zhang , Yuxuan Wang , Russell John Wyatt Skerry-Ryan , Ryan M. Rifkin , Ioannis Agiomyrgiannakis
Abstract: Methods, systems, and computer program products for generating, from an input character sequence, an output sequence of audio data representing the input character sequence. The output sequence of audio data includes a respective audio output sample for each of a number of time steps. One example method includes, for each of the time steps: generating a mel-frequency spectrogram for the time step by processing a representation of a respective portion of the input character sequence using a decoder neural network; generating a probability distribution over a plurality of possible audio output samples for the time step by processing the mel-frequency spectrogram for the time step using a vocoder neural network; and selecting the audio output sample for the time step from the possible audio output samples in accordance with the probability distribution.
-
公开(公告)号:US20200034436A1
公开(公告)日:2020-01-30
申请号:US16521780
申请日:2019-07-25
Applicant: Google LLC
Inventor: Zhifeng Chen , Macduff Richard Hughes , Yonghui Wu , Michael Schuster , Xu Chen , Llion Owen Jones , Niki J. Parmar , George Foster , Orhan Firat , Ankur Bapna , Wolfgang Macherey , Melvin Jose Johnson Premkumar
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for machine translation using neural networks. In some implementations, a text in one language is translated into a second language using a neural network model. The model can include an encoder neural network comprising a plurality of bidirectional recurrent neural network layers. The encoding vectors are processed using a multi-headed attention module configured to generate multiple attention context vectors for each encoding vector. A decoder neural network generates a sequence of decoder output vectors using the attention context vectors. The decoder output vectors can represent distributions over various language elements of the second language, allowing a translation of the text into the second language to be determined based on the sequence of decoder output vectors.
-
公开(公告)号:US20240420686A1
公开(公告)日:2024-12-19
申请号:US18815200
申请日:2024-08-26
Applicant: Google LLC
Inventor: Rohit Prakash Prabhavalkar , Zhifeng Chen , Bo Li , Chung-Cheng Chiu , Kanury Kanishka Rao , Yonghui Wu , Ron J. Weiss , Navdeep Jaitly , Michiel A. U. Bacchiani , Tara N. Sainath , Jan Kazimierz Chorowski , Anjuli Patricia Kannan , Ekaterina Gonina , Patrick An Phu Nguyen
Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.
-
39.
公开(公告)号:US20240311405A1
公开(公告)日:2024-09-19
申请号:US18337316
申请日:2023-06-19
Applicant: GOOGLE LLC
Inventor: Seungyeon Kim , Ankit Singh Rawat , Wittawat Jitkrittum , Hari Narasimhan , Sashank Reddi , Neha Gupta , Srinadh Bhojanapalli , Aditya Menon , Manzil Zaheer , Tal Schuster , Sanjiv Kumar , Toby Boyd , Zhifeng Chen , Emanuel Taropa , Vikram Kasivajhula , Trevor Strohman , Martin Baeuml , Leif Schelin , Yanping Huang
IPC: G06F16/332
CPC classification number: G06F16/3329
Abstract: Implementations disclose selecting, in response to receiving a request and from among multiple candidate generative models (e.g., multiple candidate large language models (LLMs)) with differing computational efficiencies, a particular generative model to utilize in generating a response to the request. Those implementations reduce latency and/or conserve computational resource(s) through selection, for various requests, of a more computationally efficient generative model for utilization in lieu of a less computationally efficient generative model. Further, those implementations seek to achieve such benefits, through utilization of more computationally efficient generative models, while also still selectively utilizing less computationally efficient generative models for certain requests to mitigate occurrences of a generated response being inaccurate and/or under-specified. This, in turn, can mitigate occurrences of computational and/or network inefficiencies that result from a user issuing a follow-up request to cure the inaccuracies and/or under-specification of a generated response.
-
公开(公告)号:US12087273B2
公开(公告)日:2024-09-10
申请号:US18161217
申请日:2023-01-30
Applicant: Google LLC
Inventor: Yu Zhang , Ron J. Weiss , Byungha Chun , Yonghui Wu , Zhifeng Chen , Russell John Wyatt Skerry-Ryan , Ye Jia , Andrew M. Rosenberg , Bhuvana Ramabhadran
IPC: G10L21/00 , G10L13/00 , G10L13/047
CPC classification number: G10L13/047
Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.
-
-
-
-
-
-
-
-
-