摘要:
An automated hotel attendant is provided for coordinating room-to-room calling over a telephone switching system that supports multiple telephone extensions. A hotel registration system receives and stores the spelled names of hotel guests as well as assigns each guest an associated telephone extension. A lexicon training system is connected to the hotel registration system for generating pronunciations for each spelled name by converting the characters that spell those names into word-phoneme data. This word-phoneme data is in turn stored in a lexicon that is used by a speech recognition system. In particular, a phoneticizer in conjunction with a Hidden Markov Model (HMM) based model trainer serves as the basis for the lexicon training system, such that one or several HMM models associated with each guest name are stored in the lexicon. An automated attendant is coupled to the speech recognition system for converting a spoken name of a hotel guest entered from one of the telephone extensions into a predefined hotel guest name that can be used to retrieve an assigned telephone extension from the hotel registration system. Next, the automated attendant causes the telephone switching system to call the requested telephone extension in response to the entry of the spoken name from one of the telephone extensions.
摘要:
Supervised adaptation speech is supplied to the recognizer and the recognizer generates the N-best transcriptions of the adaptation speech. These transcriptions include the one transcription known to be correct, based on a priori knowledge of the adaptation speech, and the remaining transcriptions known to be incorrect. The system applies weights to each transcription: a positive weight to the correct transcription and negative weights to the incorrect transcriptions. These weights have the effect of moving the incorrect transcriptions away from the correct one, rendering the recognition system more discriminative for the new speaker's speaking characteristics. Weights applied to the incorrect solutions are based on the respective likelihood scores generated by the recognizer. The sum of all weights (positive and negative) are a positive number. This ensures that the system will converge.
摘要:
The mixed decision tree includes a network of yes-no questions about adjacent letters in a spelled word sequence and also about adjacent phonemes in the phoneme sequence corresponding to the spelled word sequence. Leaf nodes of the mixed decision tree provide information about which phonetic transcriptions are most probable. Using the mixed trees, scores are developed for each of a plurality of possible pronunciations, and these scores can be used to select the best pronunciation as well as to rank pronunciations in order of probability. The pronunciations generated by the system can be used in speech synthesis and speech recognition applications as well as lexicography applications.
摘要:
A media capture device has an audio input receptive of user speech relating to a media capture activity in close temporal relation to the media capture activity. A plurality of focused speech recognition lexica respectively relating to media capture activities are stored on the device, and a speech recognizer recognizes the user speech based on a selected one of the focused speech recognition lexica. A media tagger tags captured media with generated speech recognition text, and a media annotator annotates the captured media with a sample of the user speech that is suitable for input to a speech recognizer. Tagging and annotating are based on close temporal relation between receipt of the user speech and capture of the captured media. Annotations may be converted to tags during post processing, employed to edit a lexicon using letter-to-sound rules and spelled word input, or matched directly to speech to retrieve captured media.
摘要:
A portable device increases user access to equipment utilizing a communications interface providing communication with the equipment in accordance with various, combinable embodiments. In one embodiment, a speech generator generates speech based on commands relating to equipment operation, which may be received from the equipment via the communications interface. A selection mechanism allows the user to select commands and thereby operate the equipment. In another embodiment, a command navigator navigates commands based on user input by shifting focus between commands, communicates a command having the focus to the speech generator, and allows the user to select a command. In a further embodiment, a phoneticizer converts the commands and/or predetermined navigation and selection options into a dynamic speech lexicon, and a speech recognizer uses the lexicon to recognize a user navigation input and/or user selection of a command. Speaker verification can also be used to enhance security using a speech biometric.
摘要:
An improved method is provided for enrolling with a resource security system. The method includes: providing an access code to a system user; accessing the resource security system using the access code; prompting the user to input a biometric feature which identifies the user; capturing a biometric feature associated with the user; and associating the captured biometric feature with the identity of the user for subsequent verification. The method further includes subsequently granting access to the secured resource based on biometric feature data input by the user.
摘要:
A method for improving recognition results of a speech recognizer uses supplementary information to confirm recognition results. A user inputs speech to a speech recognizer. The speech recognizer resides on a mobile device or on a server at a remote location. The speech recognizer determines a recognition result based on the input speech. A confidence measure is calculated for the recognition result. If the confidence measure is below a threshold, the user is prompted for supplementary data. The supplementary data is determined dynamically based on ambiguities between the input speech and the recognition result, wherein the supplementary data will distinguish the input speech over potential incorrect results. The supplementary data may be a subset of alphanumeric characters that comprise the input speech, or other data associated with a desired result, such as an area code or location. The user may provide the supplementary data verbally, or manually using a keypad, touchpad, touchscreen, or stylus pen.
摘要:
A wearable, computerized apparatus for use with law enforcement has an evidence collector adapted to collect evidentiary information of a type collected according to law enforcement procedures and useful for identification of a suspect. It further has a safety monitor adapted to collect safety information relating to well-being of an officer. A wireless communications link communicates the evidentiary information and the safety information to a centralized component of a distributed communications system to assist in identifying suspects and dispatching assistance.
摘要:
Personalized agent services are provided in a personal messaging device, such as a cellular telephone or personal digital assistant, through services of a speech recognizer that converts speech into text and a text-to-speech synthesizer that converts text to speech. Both recognizer and synthesizer may be server-based or locally deployed within the device. The user dictates an e-mail message which is converted to text and stored. The stored text is sent back to the user as text or as synthesized speech, to allow the user to edit the message and correct transcription errors before sending as e-mail. The system includes a summarization module that prepares short summaries of incoming e-mail and voice mail. The user may access these summaries, and retrieve and organize email and voice mail using speech commands.
摘要:
A method for improving recognition results of a speech recognizer uses supplementary information to confirm recognition results. A user inputs speech to a speech recognizer. The speech recognizer resides on a mobile device or on a server at a remote location. The speech recognizer determines a recognition result based on the input speech. A confidence measure is calculated for the recognition result. If the confidence measure is below a threshold, the user is prompted for supplementary data. The supplementary data is determined dynamically based on ambiguities between the input speech and the recognition result, wherein the supplementary data will distinguish the input speech over potential incorrect results. The supplementary data may be a subset of alphanumeric characters that comprise the input speech, or other data associated with a desired result, such as an area code or location. The user may provide the supplementary data verbally, or manually using a keypad, touchpad, touchscreen, or stylus pen.