Method and apparatus for open-vocabulary end-to-end speech recognition
摘要:
A speech recognition system includes an input device to receive voice sounds, one or more processors, and one or more storage devices storing parameters and program modules including instructions which cause the one or more processors to perform operations. The operations include extracting an acoustic feature sequence from audio waveform data converted from the voice sounds, encoding the acoustic feature sequence into a hidden vector sequence using an encoder network having encoder network parameters, predicting first output label sequence probabilities by feeding the hidden vector sequence to a decoder network having decoder network parameters, predicting second output level sequence probabilities by a hybrid network using character-base language models (LMs) and word-level LMs; and searching, using a label sequence search module, for an output label sequence having a highest sequence probability by combining the first and second output label sequence probabilities provided from the decoder network and the hybrid network.
信息查询
0/0