摘要:
A method and system for speech recognition combines different types of engines in order to recognize user-defined digits and control words, predefined digits and control words, and nametags. Speaker-independent engines are combined with speaker-dependent engines. A Hidden Markov Model (HMM) engine is combined with Dynamic Time Warping (DTW) engines.
摘要:
A voice recognition rejection scheme for capturing an utterance includes the steps accepting the utterance, applying an N-best algorithm to the utterance, or rejecting the utterance. The utterance is accepted if a first predefined relationship exists between one or more closest comparison results for the utterance with respect to a stored word and one or more differences between the one or more closest comparison results and one or more other comparison results between the utterance and one or more other stored words. An N-best algorithm is applied to the utterance if a second predefined relationship exists between the one or more closest comparison results and the one or more differences between the one or more closest comparison results and the one or more other comparison results. The utterance is rejected if a third predefined relationship exists between the one or more closest comparison results and the one or more differences between the one or more closest comparison results and the one or more other comparison results. One of the one or more other comparison results may advantageously be a next-closest comparison result for the utterance and another store word. The first, second, and third predefined relationships may advantageously be linear relationships.
摘要:
An apparatus for accurate endpointing of speech in the presence of noise includes a processor and a software module. The processor executes the instructions of the software module to compare an utterance with a first signal-to-noise-ratio (SNR) threshold value to determine a first starting point and a first ending point of the utterance. The processor then compares with a second SNR threshold value a part of the utterance that predates the first starting point to determine a second starting point of the utterance. The processor also then compares with the second SNR threshold value a part of the utterance that postdates the first ending point to determine a second ending point of the utterance. The first and second SNR threshold values are recalculated periodically to reflect changing SNR conditions. The first SNR threshold value advantageously exceeds the second SNR threshold value.
摘要:
Generally stated a method and an accompanying apparatus provides for a voice recognition system (300) with programmable front end processing unit (400). The front end processing unit (400) requests and receives different configuration files at different times for processing voice data in the voice recognition system (300). The configuration files are communicated to the front end unit via a communication link (310) for configuring the front end processing unit (400). A microprocessor may provide the front end configuration files on the communication link at different times.
摘要:
A method and apparatus providing a user interface within a phone that responds to a limited vocabulary of user trained voice commands. The interface allows users to perform all phone handset dialing functions using voice commands. Additionally, users will be able to create and modify entries within a voice recognition phonebook, whereby a number within the voice recognition phonebook can be called by saying the name associated with the number. The user interface provides a combination of voice and LCD displayed user prompts and responses to voice input. The interface responds to user voice commands and performs the command functions based upon matches to previously user trained voice command vocabulary words stored in memory.
摘要:
The techniques of this disclosure are directed to the feedback-based stereoscopic display of three-dimensional images, such as may be used for video telephony (VT) and human-machine interface (HMI) application. According to one example, a region of interest (ROI) of stereoscopically captured images may be automatically determined based on determining disparity for at least one pixel of the captured images are described herein. According to another example, a zero disparity plane (ZDP) for the presentation of a 3D representation of stereoscopically captured images may be determined based on an identified ROI. According to this example, the ROI may be automatically identified, or identified based on receipt of user input identifying the ROI.
摘要:
The techniques of this disclosure are directed to the feedback-based stereoscopic display of three-dimensional images, such as may be used for video telephony (VT) and human-machine interface (HMI) application. According to one example, a region of interest (ROI) of stereoscopically captured images may be automatically determined based on determining disparity for at least one pixel of the captured images are described herein. According to another example, a zero disparity plane (ZDP) for the presentation of a 3D representation of stereoscopically captured images may be determined based on an identified ROI. According to this example, the ROI may be automatically identified, or identified based on receipt of user input identifying the ROI.
摘要:
An improved system for an interactive voice recognition system (400) includes a voice prompt generator (401) for generating voice prompt in a first frequency band (501). A speech detector (406) detects presence of speech energy in a second frequency band (502). The first and second frequency bands (501, 502) are essentially conjugate frequency bands. A voice data generator (412) generates voice data based on an output of the voice prompt generator (401) and audible speech of a voice response generator (402). A control signal (422) controls the voice prompt generator (401) based on whether the speech detector (406) detects presence of speech energy in the second frequency band (502). A back end (405) of the interactive voice recognition system (400) is configured to operate on an extracted front end voice feature based on whether the speech detector (406) detects presence of speech energy in the second frequency band (502).
摘要:
In an image/video encoding and decoding system employing an artifact evaluator a method and/or apparatus to process video blocks comprising a decoder operable to synthesize an un-filtered reconstructed video block or frame and an artifact filter operable to receive the un-filtered reconstructed video block or frame, which generates a filtered reconstructed video block or frame. A memory buffer operable to store either the filtered reconstructed video block or frame or the un-filtered reconstructed video block or frame, and an artifact evaluator operable to update the memory buffer after evaluating and determining which of the filtered video block or frame, or the un-filtered video block or frame yields better image/video quality.
摘要:
In an image/video encoding and decoding system employing an artifact evaluator a method and/or apparatus to process video blocks comprising a decoder operable to synthesize an un-filtered reconstructed video block or frame and an artifact filter operable to receive the un-filtered reconstructed video block or frame, which generates a filtered reconstructed video block or frame. A memory buffer operable to store either the filtered reconstructed video block or frame or the un-filtered reconstructed video block or frame, and an artifact evaluator operable to update the memory buffer after evaluating and determining which of the filtered video block or frame, or the un-filtered video block or frame yields better image/video quality.