摘要:
A computer-implemented method and apparatus is provided for processing a spoken request from a user. A speech recognizer converts the spoken request into a digital format. A frame data structure associates semantic components of the digitized spoken request with predetermined slots. The slots are indicative of data which are used to achieve a predetermined goal. A speech understanding module which is connected to the speech recognizer and to the frame data structure determines semantic components of the spoken request. The slots are populated based upon the determined semantic components. A dialog manager which is connected to the speech understanding module may determine at least one slot which is unpopulated based upon the determined semantic components and in a preferred embodiment may provide confirmation of the populated slots. A computer generated-request is formulated in order for the user to provide data related to the unpopulated slot. The method and apparatus are well-suited (but not limited) to use in a hand-held speech translation device.
摘要:
The input speech is segmented using plural grammar networks, including a network that includes a filler model designed to represent noise or extraneous speech. Recognition processing results in plural lists of candidates, each list containing the N-best candidates generated. The lists are then separately aligned with the dictionary of valid names to generate two lists of valid names. The final recognition pass combines these two lists of names into a dynamic grammar and this dynamic grammar may be used to find the best candidate name using Viterbi recognition. A telephone call routing application based on the recognition system selects the best candidate name corresponding to the name spelled by the user, whether the user pronounces the name prior to spelling, or not.
摘要:
The call routing device plugs into existing extensions of the office telephone network or PBX system and acts as a "virtual" operator, prompting incoming callers to spell the name of the desired recipient. The speech recognizer uses a multipass procedure employing Hidden Markov Models and dynamic programming. The N-best hypotheses are propagated between passes, allowing the more computationally costly routines to be reserved until the final pass, when the size of the search space is significantly reduced. The routing device prompts the user to confirm that the selected name is correct, whereupon the device signals the telephone network to automatically switch the incoming call to the telephone extension of the selected recipient.
摘要:
The speech recognizer incorporates a language model that reduces the number of acoustic pattern matching sequences that must be performed by the recognizer. The language model is based on knowledge of a pre-defined set of syntactically defined content and includes a data structure that organizes the content according to acoustic confusability. A spelled name recognition system based on the recognizer employs a language model based on classes of letters that the recognizer frequently confuses for one another. The language model data structure is optionally an N-gram data structure, a tree data structure, or an incrementally configured network that is built during a training sequence. The incrementally configured network has nodes that are selected based on acoustic distance from a predetermined lexicon.
摘要:
A system and method for identifying a user of a handheld device is herein disclosed. The device implementing the method and system may attempt to identify a user based on signals that are incidental to a user's handling of the device. The signals are generated by a variety of sensors dispersed along the periphery or within the housing. The sensors range may include touch sensors, inertial sensors, acoustic sensors, pulse oximiters, and a touchpad. Based on the sensors and corresponding signals, identification information is generated. The identification information is used to identify the user of the handheld device. The handheld device may implement various statistical learning and data mining techniques to increase the robustness of the system. The device may also authenticate the user based on the user drawing a circle, or other shape.
摘要:
A library of mouth shapes is created by separating speaker-dependent and speaker independent variability. Preferably, speaker dependent variability is modeled by a speaker space while the speaker independent variability (i.e. context dependency), is modeled by a set of normalized mouth shapes that need be built only once. Given a small amount of data from a new speaker, it is possible to construct a corresponding mouth shape library by estimating a point in speaker space that maximizes the likelihood of adaptation data and by combining speaker dependent and speaker independent variability. Creation of talking heads is simplified because creation of a library of mouth shapes is enabled with only a few mouth shape instances. To build the speaker space, a context independent mouth shape parametric representation is obtained. Then a supervector containing the set of context-independent mouth shapes is formed for each speaker included in the speaker space. Dimensionality reduction is used to find the areas of the speaker space.
摘要:
A robotic nursing system for use with a patient comprises a nursing robot having at least one patient condition sensor, a transmitter, and a receiver mounted therein. A display device for displays data sensed by the patient condition sensor. The display device includes a receiver in communication with the nursing robot. The nursing robot senses patient physiological conditions using the patient condition sensor and transmits the physiological conditions to the display device using the transmitter. The display device then displays the physiological conditions for review by a user. One or another or both. The nursing robot also transmits the physiological conditions to a patient database for storage.
摘要:
An e-mail message process is provided for use with a personal digital assistant which allows for the use of input speech messaging which is converted to text using a focused language model which is downloaded by a cellular phone connection to an Internet server which provides the focused language model based upon a topic for the intended e-mail message. The text that is generated from the input speech method can be summarized by the e-mail message processor and can be edited by the user. The generated e-mail message can then be transmitted again via cellular connection to an Internet e-mail server for transmitting the e-mail message to a recipient.
摘要:
A reduced dimensionality eigenvoice analytical technique is used during training to develop context-dependent acoustic models for allophones. Re-estimation processes are performed to more strongly separate speaker-dependent and speaker-independent components of the speech model. The eigenvoice technique is also used during run time upon the speech of a new speaker. The technique removes individual speaker idiosyncrasies, to produce more universally applicable and robust allophone models. In one embodiment the eigenvoice technique is used to identify the centroid of each speaker, which may then be “subtracted out” of the recognition equation.
摘要:
A navigation apparatus is disclosed which may be used by law enforcement personnel for rapid intervention to a location while adding safety and reliability to the process. The apparatus includes a computer system, having an operating system, memory and a user interface. The system further includes a positioning system, such as a GPS system for determining the position of a vehicle. The positioning system communicates with the operating system. An information database, communicating with the operating system, contains data related to routing information concerning routes for travel by the vehicle. The routing information includes safety information concerning route safety in the traveling region accessible by the vehicle. The apparatus further includes a routing system in communication with the operating system that determines a route based at least in part on the routing information. Driving directions and call information are provided multi-modally to provide the officer with critical information in an efficient and timely fashion.