Patent search ap:("GOOGLE LLC") AND inv:"Hyun Jin Park" Page 1

1.

发明公开
END-TO-END STREAMING KEYWORD SPOTTING 审中-公开

公开(公告)号：US20240177708A1

公开(公告)日：2024-05-30

申请号：US18432282

申请日：2024-02-05

Applicant: Google LLC

Inventor： Raziel Alvarez Guevara , Hyun Jin Park

IPC: G10L15/16 , G10L15/02 , G10L15/06 , G10L15/22 , G10L15/08

CPC classification number: G10L15/16 , G10L15/02 , G10L15/063 , G10L15/22 , G10L2015/025 , G10L2015/088 , G10L2015/223

Abstract: A method for detecting a hotword includes receiving a sequence of input frames that characterize streaming audio captured by a user device and generating a probability score indicating a presence of a hotword in the streaming audio using a memorized neural network. The network includes sequentially-stacked single value decomposition filter (SVDF) layers and each SVDF layer includes at least one neuron. Each neuron includes a respective memory component, a first stage configured to perform filtering on audio features of each input frame individually and output to the memory component, and a second stage configured to perform filtering on all the filtered audio features residing in the respective memory component. The method also includes determining whether the probability score satisfies a hotword detection threshold and initiating a wake-up process on the user device for processing additional terms.

2.

发明授权
End-to-end streaming keyword spotting 有权

公开(公告)号：US11929064B2

公开(公告)日：2024-03-12

申请号：US18151540

申请日：2023-01-09

Applicant: Google LLC

Inventor： Raziel Alvarez Guevara , Hyun Jin Park

IPC: G10L15/16 , G10L15/02 , G10L15/06 , G10L15/08 , G10L15/22

CPC classification number: G10L15/16 , G10L15/02 , G10L15/063 , G10L15/22 , G10L2015/025 , G10L2015/088 , G10L2015/223

Abstract: A method for detecting a hotword includes receiving a sequence of input frames that characterize streaming audio captured by a user device and generating a probability score indicating a presence of a hotword in the streaming audio using a memorized neural network. The network includes sequentially-stacked single value decomposition filter (SVDF) layers and each SVDF layer includes at least one neuron. Each neuron includes a respective memory component, a first stage configured to perform filtering on audio features of each input frame individually and output to the memory component, and a second stage configured to perform filtering on all the filtered audio features residing in the respective memory component. The method also includes determining whether the probability score satisfies a hotword detection threshold and initiating a wake-up process on the user device for processing additional terms.

3.

发明授权
End-to-end streaming keyword spotting 有权

公开(公告)号：US11056101B2

公开(公告)日：2021-07-06

申请号：US16709191

申请日：2019-12-10

Applicant: Google LLC

Inventor： Raziel Alvarez Guevara , Hyun Jin Park , Patrick Violette

IPC: G10L15/16 , G10L15/02 , G10L15/06 , G10L15/22 , G10L15/08

Abstract: A method for training hotword detection includes receiving a training input audio sequence including a sequence of input frames that define a hotword that initiates a wake-up process on a device. The method also includes feeding the training input audio sequence into an encoder and a decoder of a memorized neural network. Each of the encoder and the decoder of the memorized neural network include sequentially-stacked single value decomposition filter (SVDF) layers. The method further includes generating a logit at each of the encoder and the decoder based on the training input audio sequence. For each of the encoder and the decoder, the method includes smoothing each respective logit generated from the training input audio sequence, determining a max pooling loss from a probability distribution based on each respective logit, and optimizing the encoder and the decoder based on all max pooling losses associated with the training input audio sequence.

4.

发明授权
End-to-end streaming keyword spotting 有权

公开(公告)号：US11682385B2

公开(公告)日：2023-06-20

申请号：US17348422

申请日：2021-06-15

Applicant: Google LLC

Inventor： Raziel Alvarez Guevara , Hyun Jin Park , Patrick Violette

IPC: G10L15/16 , G10L15/02 , G10L15/06 , G10L15/22 , G10L15/08

CPC classification number: G10L15/16 , G10L15/02 , G10L15/063 , G10L15/22 , G10L2015/025 , G10L2015/088 , G10L2015/223

Abstract: A method for training hotword detection includes receiving a training input audio sequence including a sequence of input frames that define a hotword that initiates a wake-up process on a device. The method also includes feeding the training input audio sequence into an encoder and a decoder of a memorized neural network. Each of the encoder and the decoder of the memorized neural network include sequentially-stacked single value decomposition filter (SVDF) layers. The method further includes generating a logit at each of the encoder and the decoder based on the training input audio sequence. For each of the encoder and the decoder, the method includes smoothing each respective logit generated from the training input audio sequence, determining a max pooling loss from a probability distribution based on each respective logit, and optimizing the encoder and the decoder based on all max pooling losses associated with the training input audio sequence.

5.

发明申请
Reinforcement Learning with Information Retrieval Feedback 有权

公开(公告)号：US20250013915A1

公开(公告)日：2025-01-09

申请号：US18348687

申请日：2023-07-07

Applicant: Google LLC

Inventor： Hyun Jin Park , Dongseong Hwang , Chang Wan Ryu

IPC: G06N20/00

Abstract: In one example aspect, the present disclosure provides an example computer-implemented method for generating feedback signals for training a machine-learned agent model. The example method can include obtaining an output of a machine-learned agent model, the output including a next state feature generated by the machine-learned agent model based on a sequence of preceding states. The example method can include processing, using a machine-learned reward model, the output and the sequence of preceding states to generate a quality indicator indicating a quality of the next state feature in view of the preceding states. The machine-learned reward model could be trained by retrieving reference data from a reference data source and computing one or more quality indicators in view of a respective training input and output(s), and the reference data. The example method can include outputting the quality indicator to a model trainer for updating the machine-learned agent model.

6.

发明公开
End-to-End Streaming Keyword Spotting 审中-公开

公开(公告)号：US20240242711A1

公开(公告)日：2024-07-18

申请号：US18619156

申请日：2024-03-27

Applicant: Google LLC

Inventor： Raziel Alvarez Guevara , Hyun Jin Park , Patrick Violette

IPC: G10L15/16 , G10L15/02 , G10L15/06 , G10L15/22 , G10L15/08

CPC classification number: G10L15/16 , G10L15/02 , G10L15/063 , G10L15/22 , G10L2015/025 , G10L2015/088 , G10L2015/223

Abstract: A method for training hotword detection includes receiving a training input audio sequence including a sequence of input frames that define a hotword that initiates a wake-up process on a device. The method also includes feeding the training input audio sequence into an encoder and a decoder of a memorized neural network. Each of the encoder and the decoder of the memorized neural network include sequentially-stacked single value decomposition filter (SVDF) layers. The method further includes generating a logit at each of the encoder and the decoder based on the training input audio sequence. For each of the encoder and the decoder, the method includes smoothing each respective logit generated from the training input audio sequence, determining a max pooling loss from a probability distribution based on each respective logit, and optimizing the encoder and the decoder based on all max pooling losses associated with the training input audio sequence.

7.

发明授权
End-to-end streaming keyword spotting 有权

公开(公告)号：US11557282B2

公开(公告)日：2023-01-17

申请号：US17155068

申请日：2021-01-21

Applicant: Google LLC

Inventor： Raziel Alvarez Guevara , Hyun Jin Park

IPC: G10L15/16 , G10L15/02 , G10L15/06 , G10L15/22 , G10L15/08

Abstract: A method for detecting a hotword includes receiving a sequence of input frames that characterize streaming audio captured by a user device and generating a probability score indicating a presence of a hotword in the streaming audio using a memorized neural network. The network includes sequentially-stacked single value decomposition filter (SVDF) layers and each SVDF layer includes at least one neuron. Each neuron includes a respective memory component, a first stage configured to perform filtering on audio features of each input frame individually and output to the memory component, and a second stage configured to perform filtering on all the filtered audio features residing in the respective memory component. The method also includes determining whether the probability score satisfies a hotword detection threshold and initiating a wake-up process on the user device for processing additional terms.

8.

发明申请
NOISY STUDENT TEACHER TRAINING FOR ROBUST KEYWORD SPOTTING 有权

公开(公告)号：US20220284891A1

公开(公告)日：2022-09-08

申请号：US17190779

申请日：2021-03-03

Applicant: GOOGLE LLC

Inventor： Hyun Jin Park , Pai Zhu , Ignacio Lopez Moreno , Niranjan Subrahmanya

IPC: G10L15/22 , G10L15/06 , G10L15/08 , G06K9/62 , G10L21/0208

Abstract: Teacher-student learning can be used to train a keyword spotting (KWS) model using augmented training instance(s). Various implementations include aggressively augmenting (e.g., using spectral augmentation) base audio data to generate augmented audio data, where one or more portions of the base instance of audio data can be masked in the augmented instance of audio data (e.g., one or more time frames can be masked, one or more frequencies can be masked, etc.). Many implementations include processing augmented audio data using a KWS teacher model to generate a soft label, and processing the augmented audio data using a KWS student model to generate predicted output. One or more portions of the KWS student model can be updated based on a comparison of the soft label and the generated predicted output.

9.

发明授权
Mixing heterogeneous loss types to improve accuracy of keyword spotting 有权

公开(公告)号：US12125476B2

公开(公告)日：2024-10-22

申请号：US17652801

申请日：2022-02-28

Applicant: Google LLC

Inventor： Hyun Jin Park , Alex Seungryong Park , Ignacio Lopez Moreno

IPC: G10L15/16 , G06N3/08 , G10L15/02 , G10L15/06 , G10L15/22 , G06N3/0455 , G10L15/08

CPC classification number: G10L15/16 , G06N3/08 , G10L15/02 , G10L15/063 , G10L15/22 , G06N3/0455 , G10L2015/025 , G10L2015/088

Abstract: A method for training a neural network includes receiving a training input audio sequence including a sequence of input frames defining a hotword that initiates a wake-up process on a user device. The method further includes obtaining a first label and a second label for the training input audio sequence. The method includes generating, using a memorized neural network and the training input audio sequence, an output indicating a likelihood the training input audio sequence includes the hotword. The method further includes determining a first loss based on the first label and the output. The method includes determining a second loss based on the second label and the output. The method further includes optimizing the memorized neural network based on the first loss and the second loss associated with the training input audio sequence.

10.

发明授权
Noisy student teacher training for robust keyword spotting 有权

公开(公告)号：US12027162B2

公开(公告)日：2024-07-02

申请号：US17190779

申请日：2021-03-03

Applicant: GOOGLE LLC

Inventor： Hyun Jin Park , Pai Zhu , Ignacio Lopez Moreno , Niranjan Subrahmanya

IPC: G10L15/22 , G06F18/24 , G10L15/06 , G10L15/08 , G10L21/0208

CPC classification number: G10L15/22 , G06F18/24 , G10L15/063 , G10L15/08 , G10L21/0208 , G10L2015/088 , G10L2015/223 , G10L2021/02082 , G10L2021/02087

Abstract: Teacher-student learning can be used to train a keyword spotting (KWS) model using augmented training instance(s). Various implementations include aggressively augmenting (e.g., using spectral augmentation) base audio data to generate augmented audio data, where one or more portions of the base instance of audio data can be masked in the augmented instance of audio data (e.g., one or more time frames can be masked, one or more frequencies can be masked, etc.). Many implementations include processing augmented audio data using a KWS teacher model to generate a soft label, and processing the augmented audio data using a KWS student model to generate predicted output. One or more portions of the KWS student model can be updated based on a comparison of the soft label and the generated predicted output.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification