-
1.
公开(公告)号:US20230317082A1
公开(公告)日:2023-10-05
申请号:US17710137
申请日:2022-03-31
Applicant: GOOGLE LLC
Inventor: Om Dipakbhai Thakkar , Hakim Sidahmed , W. Ronny Huang , Rajiv Mathews , Françoise Beaufays , Florian Tramèr
CPC classification number: G10L15/26 , G10L15/063 , G10L13/02
Abstract: An unintentional memorization measure can be used to determine whether an automatic speech recognition (ASR) model has unintentionally memorized one or more phrases during training of the ASR model. Various implementations include generating one or more candidate transcripts based on the vocabulary of the ASR model. For example, the system can generate a candidate transcript by appending a token of the vocabulary to a previous candidate transcript. Various implementations include processing the candidate transcript using a speech synthesis model to generate synthesized speech audio data that includes synthesized speech of the candidate transcript. Additionally or alternatively, the synthesized speech audio data can be processed using the ASR model to generate ASR output. Various implementations can include generating a loss based on comparing the ASR output and the candidate transcript.