-
公开(公告)号:US11551695B1
公开(公告)日:2023-01-10
申请号:US15931455
申请日:2020-05-13
Applicant: Amazon Technologies, Inc.
Inventor: Vivek Govindan , Varun Sembium Varadarajan , Christian Egon Berkhoff Dossow , Himalay Mohanlal Joriwal , Sai Madhuri Bhavirisetty , Abhinav Kumar , Orestis Lykouropoulos , Akshay Nalwaya , Rahul Gupta , Sravan Babu Bodapati , Liangwei Guo , Julian E. S. Salazar , Yibin Wang , K P N V D S Siva Rama , Calvin Xuan Li , Mohit Narendra Gupta , Asem Rustum , Katrin Kirchhoff , Pu Zhao
Abstract: A transcription service may receive a request from a developer to build a custom speech-to-text model for a specific domain of speech. The custom speech-to-text model for the specific domain may replace a general speech-to-text model or add to a set of one or more speech-to-text models available for transcribing speech. The transcription service may receive a training data and instructions representing tasks. The transcription service may determine respective schedules for executing the instructions based at least in part on dependencies between the tasks. The transcription service may execute the instructions according to the respective schedules to train a speech-to-text model for a specific domain using the training data set. The transcription service may deploy the trained speech-to-text model as part of a network-accessible service for an end user to convert audio in the specific domain into texts.
-
公开(公告)号:US12198681B1
公开(公告)日:2025-01-14
申请号:US17937297
申请日:2022-09-30
Applicant: Amazon Technologies, Inc.
Inventor: Monica Lakshmi Sunkara , Srikanth Ronanki , Sravan Babu Bodapati , Jeffrey John Farris , Katrin Kirchhoff , Vivek Govindan , Yide Zou , Mohit Narendra Gupta , Silviu Mihai Burz
Abstract: Techniques for personalized batch and streaming speech-to-text transcription of audio reduce the error rate of automatic speech recognition (ASR) systems in transcribing rare and out-of-vocabulary words. The techniques achieve personalization of connectionist temporal classification (CT) models by using adaptive boosting to perform biasing at the level of sub-words. In addition to boosting, the techniques encompass a phone alignment network to bias sub-word predictions towards rare long-tail words and out-of-vocabulary words. A technical benefit of the techniques is that the accuracy of speech-to-text transcription of rare and out-of-vocabulary words in a custom vocabulary by automatic speech recognition (ASR) system can be improved without having to train the ASR system on the custom vocabulary. Instead, the techniques allow the same ASR system trained on a base vocabulary to realize the accuracy improvements for different custom vocabularies spanning different domains.
-