Patent search ap:("Google LLC") AND inv:"Pai Zhu" Page 1

1.

发明申请
TARGET SPEAKER KEYWORD SPOTTING 有权

公开(公告)号：US20250078840A1

公开(公告)日：2025-03-06

申请号：US18812338

申请日：2024-08-22

Applicant: Google LLC

Inventor： Pai Zhu , Beltrán Labrador Serrano , Guanlong Zhao , Angelo Alfredo Scorza Scarpati , Quan Wang , Alex Seungryong Park , Ignacio Lopez Moreno

IPC: G10L17/02 , G10L17/04 , G10L17/22

Abstract: A method includes receiving audio data corresponding to an utterance spoken by a particular user and captured in streaming audio by a user device. The method also includes performing speaker identification on the audio data to identify an identity of the particular user that spoke the utterance. The method also includes obtaining a keyword detection model personalized for the particular user based on the identity of the particular user that spoke the utterance. The keyword detection model is conditioned on speaker characteristic information associated with the particular user to adapt the keyword detection model to detect a presence of a keyword in audio for the particular user. The method also includes determining that the utterance includes the keyword using the keyword detection model personalized for the particular user.

2.

发明申请
NOISY STUDENT TEACHER TRAINING FOR ROBUST KEYWORD SPOTTING 有权

公开(公告)号：US20220284891A1

公开(公告)日：2022-09-08

申请号：US17190779

申请日：2021-03-03

Applicant: GOOGLE LLC

Inventor： Hyun Jin Park , Pai Zhu , Ignacio Lopez Moreno , Niranjan Subrahmanya

IPC: G10L15/22 , G10L15/06 , G10L15/08 , G06K9/62 , G10L21/0208

Abstract: Teacher-student learning can be used to train a keyword spotting (KWS) model using augmented training instance(s). Various implementations include aggressively augmenting (e.g., using spectral augmentation) base audio data to generate augmented audio data, where one or more portions of the base instance of audio data can be masked in the augmented instance of audio data (e.g., one or more time frames can be masked, one or more frequencies can be masked, etc.). Many implementations include processing augmented audio data using a KWS teacher model to generate a soft label, and processing the augmented audio data using a KWS student model to generate predicted output. One or more portions of the KWS student model can be updated based on a comparison of the soft label and the generated predicted output.

3.

发明授权
Noisy student teacher training for robust keyword spotting 有权

公开(公告)号：US12027162B2

公开(公告)日：2024-07-02

申请号：US17190779

申请日：2021-03-03

Applicant: GOOGLE LLC

Inventor： Hyun Jin Park , Pai Zhu , Ignacio Lopez Moreno , Niranjan Subrahmanya

IPC: G10L15/22 , G06F18/24 , G10L15/06 , G10L15/08 , G10L21/0208

CPC classification number: G10L15/22 , G06F18/24 , G10L15/063 , G10L15/08 , G10L21/0208 , G10L2015/088 , G10L2015/223 , G10L2021/02082 , G10L2021/02087

Abstract: Teacher-student learning can be used to train a keyword spotting (KWS) model using augmented training instance(s). Various implementations include aggressively augmenting (e.g., using spectral augmentation) base audio data to generate augmented audio data, where one or more portions of the base instance of audio data can be masked in the augmented instance of audio data (e.g., one or more time frames can be masked, one or more frequencies can be masked, etc.). Many implementations include processing augmented audio data using a KWS teacher model to generate a soft label, and processing the augmented audio data using a KWS student model to generate predicted output. One or more portions of the KWS student model can be updated based on a comparison of the soft label and the generated predicted output.

Patent Agency Ranking