Abstract:
Methods and apparatus to process microphone signals by a speech enhancement module to generate an audio stream signal including first and second metadata for use by a speech recognition module. In an embodiment, speech recognition is performed using endpointing information including transitioning from a silence state to a maybe speech state, in which data is buffered, based on the first metadata and transitioning to a speech state, in which speech recognition is performed, based upon the second metadata.
Abstract:
There is provided a speech dialog system that includes a first microphone, a second microphone, a processor and a memory. The first microphone captures first audio from a first spatial zone, and produces a first audio signal. The second microphone captures second audio from a second spatial zone, and produces a second audio signal. The processor receives the first audio signal and the second audio signal, and the memory contains instructions that control the processor to perform operations of a speech enhancement module, an automatic speech recognition module, and a speech dialog module that performs a zone-dedicated speech dialog.