Abstract:
Disclosed herein are systems, methods, and computer-readable storage media for performing a search. A system configured to practice the method first receives from an automatic speech recognition (ASR) system a word lattice based on speech query and receives indexed documents from an information repository. The system composes, based on the word lattice and the indexed documents, at least one triple including a query word, selected indexed document, and weight. The system generates an N-best path through the word lattice based on the at least one triple and re-ranks ASR output based on the N-best path. The system aggregates each weight across the query words to generate N-best listings and returns search results to the speech query based on the re-ranked ASR output and the N-best listings. The lattice can be a confusion network, the arc density of which can be adjusted for a desired performance level.
Abstract:
Disclosed herein are systems, methods and non-transitory computer-readable media for performing speech recognition across different applications or environments without model customization or prior knowledge of the domain of the received speech. The disclosure includes recognizing received speech with a collection of domain-specific speech recognizers, determining a speech recognition confidence for each of the speech recognition outputs, selecting speech recognition candidates based on a respective speech recognition confidence for each speech recognition output, and combining selected speech recognition candidates to generate text based on the combination.