摘要:
Generally stated a method and an accompanying apparatus provides for a voice recognition system (300) with programmable front end processing unit (400). The front end processing unit (400) requests and receives different configuration files at different times for processing voice data in the voice recognition system (300). The configuration files are communicated to the front end unit via a communication link (310) for configuring the front end processing unit (400). A microprocessor may provide the front end configuration files on the communication link at different times.
摘要:
A method and system for speech recognition combines different types of engines in order to recognize user-defined digits and control words, predefined digits and control words, and nametags. Speaker-independent engines are combined with speaker-dependent engines. A Hidden Markov Model (HMM) engine is combined with Dynamic Time Warping (DTW) engines.
摘要:
A voice recognition rejection scheme for capturing an utterance includes the steps accepting the utterance, applying an N-best algorithm to the utterance, or rejecting the utterance. The utterance is accepted if a first predefined relationship exists between one or more closest comparison results for the utterance with respect to a stored word and one or more differences between the one or more closest comparison results and one or more other comparison results between the utterance and one or more other stored words. An N-best algorithm is applied to the utterance if a second predefined relationship exists between the one or more closest comparison results and the one or more differences between the one or more closest comparison results and the one or more other comparison results. The utterance is rejected if a third predefined relationship exists between the one or more closest comparison results and the one or more differences between the one or more closest comparison results and the one or more other comparison results. One of the one or more other comparison results may advantageously be a next-closest comparison result for the utterance and another store word. The first, second, and third predefined relationships may advantageously be linear relationships.
摘要:
A method and apparatus providing a user interface within a phone that responds to a limited vocabulary of user trained voice commands. The interface allows users to perform all phone handset dialing functions using voice commands. Additionally, users will be able to create and modify entries within a voice recognition phonebook, whereby a number within the voice recognition phonebook can be called by saying the name associated with the number. The user interface provides a combination of voice and LCD displayed user prompts and responses to voice input. The interface responds to user voice commands and performs the command functions based upon matches to previously user trained voice command vocabulary words stored in memory.
摘要:
A system and method for forming a segmented speech signal from an input speech signal having a plurality of frames. The input speech signal is converted from a time domain signal to a frequency domain signal having a plurality of speech frames, wherein each speech frame in the frequency domain signal is represented by at least one spectral value associated with the speech frame. A spectral difference value is then determined for each pair of adjacent frames in the frequency domain signal, wherein the spectral difference value for each pair of adjacent frames is representative of a difference between the at least one spectral value associated with each frame in the pair of adjacent frames. An initial cluster boundary is set between each pair of adjacent frames in the frequency domain signal, and a variance value is assigned to each cluster in the frequency domain signal, wherein the variance value for each cluster is equal to one of the determined spectral difference values. Next, a plurality of cluster merge parameters is calculated, wherein each of the cluster merge parameters is associated with a pair of adjacent clusters in the frequency domain signal. A minimum cluster merge parameter is selected from the plurality of cluster merge parameters. A merged cluster is then formed by canceling a cluster boundary between the clusters associated with the minimum merge parameter and assigning a merged variance value to the merged cluster, wherein the merged variance value is representative of the variance values assigned to the clusters associated with the minimum merge parameter. The process is repeated in order to form a plurality of merged clusters, and the segmented speech signal is formed in accordance with the plurality of merged clusters.
摘要:
The techniques of this disclosure are directed to the feedback-based stereoscopic display of three-dimensional images, such as may be used for video telephony (VT) and human-machine interface (HMI) application. According to one example, a region of interest (ROI) of stereoscopically captured images may be automatically determined based on determining disparity for at least one pixel of the captured images are described herein. According to another example, a zero disparity plane (ZDP) for the presentation of a 3D representation of stereoscopically captured images may be determined based on an identified ROI. According to this example, the ROI may be automatically identified, or identified based on receipt of user input identifying the ROI.
摘要:
The techniques of this disclosure are directed to the feedback-based stereoscopic display of three-dimensional images, such as may be used for video telephony (VT) and human-machine interface (HMI) application. According to one example, a region of interest (ROI) of stereoscopically captured images may be automatically determined based on determining disparity for at least one pixel of the captured images are described herein. According to another example, a zero disparity plane (ZDP) for the presentation of a 3D representation of stereoscopically captured images may be determined based on an identified ROI. According to this example, the ROI may be automatically identified, or identified based on receipt of user input identifying the ROI.
摘要:
An apparatus for accurate endpointing of speech in the presence of noise includes a processor and a software module. The processor executes the instructions of the software module to compare an utterance with a first signal-to-noise-ratio (SNR) threshold value to determine a first starting point and a first ending point of the utterance. The processor then compares with a second SNR threshold value a part of the utterance that predates the first starting point to determine a second starting point of the utterance. The processor also then compares with the second SNR threshold value a part of the utterance that postdates the first ending point to determine a second ending point of the utterance. The first and second SNR threshold values are recalculated periodically to reflect changing SNR conditions. The first SNR threshold value advantageously exceeds the second SNR threshold value.
摘要:
The example techniques of this disclosure are directed to generating a stereoscopic view from an application designed to generate a mono view. For example, the techniques may modify source code of a vertex shader to cause the modified vertex shader, when executed, to generate graphics content for the images of the stereoscopic view. As another example, the techniques may modify a command that defines a viewport for the mono view to commands that define the viewports for the images of the stereoscopic view.
摘要:
Embodiments include methods and systems which determine pixel displacement between frames based on a respective weighting-value for each pixel or a group of pixels. The weighting-values provide an indication as to which pixels are more pertinent to optical flow computations. Computational resources and effort can be focused on pixels with higher weights, which are generally more pertinent to optical flow determinations.