摘要:
Techniques are disclosed for client-side analysis of audio samples to identify one or more characteristics associated with captured audio. The client-side analysis may then allow a user device, e.g., a smart phone, laptop computer, in-car infotainment system, and so on, to provide the one or more identified characteristics as configuration data to a voice recognition service at or shortly after connection with the same. In turn, the voice recognition service may load one or more recognition components, e.g., language models and/or application modules/engines, based on the received configuration data. Thus, latency may be reduced based on the voice recognition engine having “hints” that allow components to be loaded without necessarily having to process audio samples first. The reduction of latency may reduce processing time relative to other approaches to voice recognitions systems that exclusively perform server-side context recognition/classification.
摘要:
Techniques related to implementing neural networks for speech recognition systems are discussed. Such techniques may include implementing frame skipping with approximated skip frames and/or distances on demand such that only those outputs needed by a speech decoder are provided via the neural network or approximation techniques.
摘要:
Techniques are disclosed for client-side analysis of audio samples to identify one or more characteristics associated with captured audio. The client-side analysis may then allow a user device, e.g., a smart phone, laptop computer, in-car infotainment system, and so on, to provide the one or more identified characteristics as configuration data to a voice recognition service at or shortly after connection with the same. In turn, the voice recognition service may load one or more recognition components, e.g., language models and/or application modules/engines, based on the received configuration data. Thus, latency may be reduced based on the voice recognition engine having “hints” that allow components to be loaded without necessarily having to process audio samples first. The reduction of latency may reduce processing time relative to other approaches to voice recognitions systems that exclusively perform server-side context recognition/classification.