Patent search ap:("Google LLC") AND inv:"Guanlong Zhao" Page 1

1.

发明申请
TARGET SPEAKER KEYWORD SPOTTING 有权

公开(公告)号：US20250078840A1

公开(公告)日：2025-03-06

申请号：US18812338

申请日：2024-08-22

Applicant: Google LLC

Inventor： Pai Zhu , Beltrán Labrador Serrano , Guanlong Zhao , Angelo Alfredo Scorza Scarpati , Quan Wang , Alex Seungryong Park , Ignacio Lopez Moreno

IPC: G10L17/02 , G10L17/04 , G10L17/22

Abstract: A method includes receiving audio data corresponding to an utterance spoken by a particular user and captured in streaming audio by a user device. The method also includes performing speaker identification on the audio data to identify an identity of the particular user that spoke the utterance. The method also includes obtaining a keyword detection model personalized for the particular user based on the identity of the particular user that spoke the utterance. The keyword detection model is conditioned on speaker characteristic information associated with the particular user to adapt the keyword detection model to detect a presence of a keyword in audio for the particular user. The method also includes determining that the utterance includes the keyword using the keyword detection model personalized for the particular user.

2.

发明公开
EVALUATION-BASED SPEAKER CHANGE DETECTION EVALUATION METRICS 审中-公开

公开(公告)号：US20240135934A1

公开(公告)日：2024-04-25

申请号：US18483492

申请日：2023-10-09

Applicant: Google LLC

Inventor： Guanlong Zhao , Quan Wang , Han Lu , Yiling Huang , Jason Pelecanos

IPC: G10L17/06 , G10L17/02 , G10L17/04

CPC classification number: G10L17/06 , G10L17/02 , G10L17/04

Abstract: A method includes obtaining a multi-utterance training sample that includes audio data characterizing utterances spoken by two or more different speakers and obtaining ground-truth speaker change intervals indicating time intervals in the audio data where speaker changes among the two or more different speakers occur. The method also includes processing the audio data to generate a sequence of predicted speaker change tokens using a sequence transduction model. For each corresponding predicted speaker change token, the method includes labeling the corresponding predicted speaker change token as correct when the predicted speaker change token overlaps with one of the ground-truth speaker change intervals. The method also includes determining a precision metric of the sequence transduction model based on a number of the predicted speaker change tokens labeled as correct and a total number of the predicted speaker change tokens in the sequence of predicted speaker change tokens.

Patent Agency Ranking