摘要:
The acoustic speech signal is decomposed into wavelets arranged in an asymmetrical tree data structure from which individual nodes may be selected to best extract local features, as needed to model specific classes of sound units. The wavelet packet transformation is smoothed through integration and compressed to apply a non-linearity prior to discrete cosine transformation. The resulting subband features such as cepstral coefficients may then be used to construct the speech recognizer's speech models. Using the local feature information extracted in this manner allows a single recognizer to be optimized for several different classes of sound units, thereby eliminating the need for parallel path recognizers.
摘要:
A noise robustness method operates jointly in a signal domain and a model domain. For example, energy is added in the signal domain for frequency bands where an actual noise level of an incoming signal is lower than a noise level used to train models, thus obtaining a compensated signal. Also, energy is added in the model domain for frequency bands where noise level of the incoming signal or the compensated signal is higher than the noise level used to train the models. Moreover, energy is never removed, thereby avoiding problems of higher sensitivity of energy removal to estimation errors.
摘要:
A system and method for identifying a user of a handheld device is herein disclosed. The device implementing the method and system may attempt to identify a user based on signals that are incidental to a user's handling of the device. The signals are generated by a variety of sensors dispersed along the periphery or within the housing. The sensors range may include touch sensors, inertial sensors, acoustic sensors, pulse oximiters, and a touchpad. Based on the sensors and corresponding signals, identification information is generated. The identification information is used to identify the user of the handheld device. The handheld device may implement various statistical learning and data mining techniques to increase the robustness of the system. The device may also authenticate the user based on the user drawing a circle, or other shape.
摘要:
Model compression is combined with model compensation. Model compression is needed in embedded ASR to reduce the size and the computational complexity of compressed models. Model-compensation is used to adapt in real-time to changing noise environments. The present invention allows for the design of smaller ASR engines (memory consumption reduced to up to one-sixth) with reduced impact on recognition accuracy and/or robustness to noises.
摘要:
Linear approximation of the background noise is applied after feature extraction and prior to speaker adaptation to allow the speaker adaptation system to adapt the speech models to the enrolling user without distortion from background noise. The linear approximation is applied in the feature domain, such as in the cepstral domain. Any adaptation technique that is commutative in the feature domain may be used.
摘要:
An embedded device for playing media files is capable of generating a play list of media files based on input speech from a user. It includes an indexer generating a plurality of speech recognition grammars. According to one aspect of the invention, the indexer generates speech recognition grammars based on contents of a media file header of the media file. According to another aspect of the invention, the indexer generates speech recognition grammars based on categories in a file path for retrieving the media file to a user location. When a speech recognizer receives an input speech from a user while in a selection mode, a media file selector compares the input speech received while in the selection mode to the plurality of speech recognition grammars, thereby selecting the media file.
摘要:
Dynamically constructed grammar-constraints and frequency or statistics-based constraints are used to constrain the speech recognizer and to optionally rescore the output to improve recognition accuracy. The recognition system is well adapted for hands-free operation of portable devices, such as for voice dialing operations.
摘要:
A media capture device has an audio input receptive of user speech relating to a media capture activity in close temporal relation to the media capture activity. A plurality of focused speech recognition lexica respectively relating to media capture activities are stored on the device, and a speech recognizer recognizes the user speech based on a selected one of the focused speech recognition lexica. A media tagger tags captured media with generated speech recognition text, and a media annotator annotates the captured media with a sample of the user speech that is suitable for input to a speech recognizer. Tagging and annotating are based on close temporal relation between receipt of the user speech and capture of the captured media. Annotations may be converted to tags during post processing, employed to edit a lexicon using letter-to-sound rules and spelled word input, or matched directly to speech to retrieve captured media.
摘要:
An e-mail message process is provided for use with a personal digital assistant which allows for the use of input speech messaging which is converted to text using a focused language model which is downloaded by a cellular phone connection to an Internet server which provides the focused language model based upon a topic for the intended e-mail message. The text that is generated from the input speech method can be summarized by the e-mail message processor and can be edited by the user. The generated e-mail message can then be transmitted again via cellular connection to an Internet e-mail server for transmitting the e-mail message to a recipient.
摘要:
A navigation apparatus is disclosed which may be used by law enforcement personnel for rapid intervention to a location while adding safety and reliability to the process. The apparatus includes a computer system, having an operating system, memory and a user interface. The system further includes a positioning system, such as a GPS system for determining the position of a vehicle. The positioning system communicates with the operating system. An information database, communicating with the operating system, contains data related to routing information concerning routes for travel by the vehicle. The routing information includes safety information concerning route safety in the traveling region accessible by the vehicle. The apparatus further includes a routing system in communication with the operating system that determines a route based at least in part on the routing information. Driving directions and call information are provided multi-modally to provide the officer with critical information in an efficient and timely fashion.