摘要:
Input is received from at least two different input sources. Information from these sources are combined together to provide a result. In a particular example, input from one source corresponds to potential recognition candidates, and input from another source corresponds to other potential candidates. These candidates are combined together to select a result.
摘要:
Speech recognition such as command and control speech recognition generally use a context free grammar to constrain the decoding process. Word or subword background model are constructed to repopulate dynamic hypothesis space, especially when word spareness is at issue. The background models can be later used in speech recognition. During speech recognition, background and conventional context free grammar decoding are used to measure confidence. The discussion above is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.
摘要:
A reliable full covariance matrix estimation algorithm for pattern unit's state output distribution in pattern recognition system is discussed. An intermediate hierarchical tree structure is built to relate models for product units. Full covariance matrices of pattern unit's state output distribution are estimated based on all the related nodes in the tree.
摘要:
Multiple input modalities are selectively used by a user or process to prune a word graph. Pruning initiates rescoring in order to generate a new word graph with a revised best path.
摘要:
Speech recognition such as command and control speech recognition generally use a context free grammar to constrain the decoding process. Word or subword background model are constructed to repopulate dynamic hypothesis space, especially when word spareness is at issue. The background models can be later used in speech recognition. During speech recognition, background and conventional context free grammar decoding are used to measure confidence. The discussion above is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.
摘要:
Input is received from at least two different input sources. Information from these sources are combined together to provide a result. In a particular example, input from one source corresponds to potential recognition candidates, and input from another source corresponds to other potential candidates. These candidates are combined together to select a result.
摘要:
A reliable full covariance matrix estimation algorithm for pattern unit's state output distribution in pattern recognition system is discussed. An intermediate hierarchical tree structure is built to relate models for product units. Full covariance matrices of pattern unit's state output distribution are estimated based on all the related nodes in the tree.
摘要:
Measurement of Kullback-Leibler Divergence (KLD) between hidden Markov models (HMM) of acoustic units utilizes an unscented transform to approximate KLD between Gaussian mixtures. Dynamic programming equalizes the number of states between HMMs having a different number of states, while the total KLD of the HMMs is obtained by summing individual KLDs calculated by state pair by state pair comparisons.
摘要:
A computationally efficient and robust pitch detection and tracking system and related methods are presented. According to certain exemplary implementations a method is presented comprising identifying an initial set of pitch period candidates using a first estimation algorithm, filtering the initial set of candidates and passing the filtered candidates through a second, more accurate pitch estimation algorithm to generate a final set of pitch period candidates from which the most likely pitch value is selected.
摘要:
Systems and methods for analyzing the content of audio/video files using speech recognition and data mining technologies are provided. As it can generally be assumed that a user's interest is highly correlated with an audio/video clip or television program the user may be watching, methods and systems for utilizing the results of speech recognition and data mining technology implementation to retrieve relevant advertising content for display are also provided.