-
公开(公告)号:US12198681B1
公开(公告)日:2025-01-14
申请号:US17937297
申请日:2022-09-30
Applicant: Amazon Technologies, Inc.
Inventor: Monica Lakshmi Sunkara , Srikanth Ronanki , Sravan Babu Bodapati , Jeffrey John Farris , Katrin Kirchhoff , Vivek Govindan , Yide Zou , Mohit Narendra Gupta , Silviu Mihai Burz
Abstract: Techniques for personalized batch and streaming speech-to-text transcription of audio reduce the error rate of automatic speech recognition (ASR) systems in transcribing rare and out-of-vocabulary words. The techniques achieve personalization of connectionist temporal classification (CT) models by using adaptive boosting to perform biasing at the level of sub-words. In addition to boosting, the techniques encompass a phone alignment network to bias sub-word predictions towards rare long-tail words and out-of-vocabulary words. A technical benefit of the techniques is that the accuracy of speech-to-text transcription of rare and out-of-vocabulary words in a custom vocabulary by automatic speech recognition (ASR) system can be improved without having to train the ASR system on the custom vocabulary. Instead, the techniques allow the same ASR system trained on a base vocabulary to realize the accuracy improvements for different custom vocabularies spanning different domains.