Abstract:
Systems and methods for assisting automatic speech recognition (ASR) are provided. An example method includes generating, by a mobile device, a plurality of instantiations of a speech component in a captured audio signal, each instantiation of the plurality of instantiations being in support of a particular hypothesis regarding the speech component. At least two instantiations of the plurality of instantiations are then sent to a remote ASR engine. The remote ASR engine is configured to recognize at least one word based on the at least two of the plurality of instantiations and a user context, according to various embodiments. This recognition can include selecting one of the instantiations of the speech component from the plurality of instantiations. The plurality of instantiations may be generated by noise suppression of the captured audio signal with different degrees of aggressiveness. In some embodiments, the plurality of instantiations is generated by synthesizing the speech component from synthetic speech parameters obtained by a spectral analysis of the captured audio signal.