摘要:
An electronic device (300) for speech dialog includes functions that receive (305, 105) a speech phrase that comprises a request phrase that includes an instantiated variable (215), generate (335, 115) pitch and voicing characteristics (315) of the instantiated variable, and performs voice recognition (319, 125) of the instantiated variable to determine a most likely set of acoustic states (235). The electronic device may generate (335, 140) a synthesized value of the instantiated variable using the most likely set of acoustic states and the pitch and voicing characteristics of the instantiated variable. The electronic device may use a table of previously entered values of variables that have been determined to be unique, and in which the values are associated with a most likely set of acoustic states and the pitch and voicing characteristics determined at the receipt of each value to disambiguate (425, 430) a newly received instantiated variable.
摘要:
An electronic device (300) for speech dialog includes functions that receive (305, 105) a speech phrase that comprises a request phrase that includes an instantiated variable (215), generate (335, 115) pitch and voicing characteristics (315) of the instantiated variable, and performs speech recognition (319, 125) of the instantiated variable to determine a most likely set of acoustic states (235). The electronic device may generate (335, 140) a synthesized value of the instantiated variable using the most likely set of acoustic states and the pitch and voicing characteristics of the instantiated variable. The electronic device may use a table of previously entered values of variables that have been determined to be unique, and in which the values are associated with a most likely set of acoustic states and the pitch and voicing characteristics determined at the receipt of each value to disambiguate (425, 430) a newly received instantiated variable.
摘要:
A method and apparatus for performing speech recognition receives an audio signal, generates a sequence of frames of the audio signal, transforms each frame of the audio signal into a set of narrow band feature vectors using a narrow passband, couples the narrow band feature vectors to a speech model, and determines whether the audio signal is a wide band signal. When the audio signal is determined to be a wide band signal, a pass band parameter of each of one or more passbands that are outside the narrow passband is generated for each frame and the one or more band energy parameters are coupled to the speech model.
摘要:
The invention provides a Hidden Markov Model (132) based automated speech recognition system (100) that dynamically adapts to changing background noise by detecting long pauses in speech, and for each pause processing background noise during the pause to extract a feature vector that characterizes the background noise, identifying a Gaussian mixture component of noise states that most closely matches the extracted feature vector, and updating the mean of the identified Gaussian mixture component so that it more closely matches the extracted feature vector, and consequently more closely matches the current noise environment. Alternatively, the process is also applied to refine the Gaussian mixtures associated with other emitting states of the Hidden Markov Model.
摘要:
Acoustic phones (preferably drawn 12 from a plurality of spoken languages) are provided 11. A hierarchically-organized polyphone network (20) organizes views of these phones of varying resolution and phone categorization as a function, at least in part, of phonetic similarity (14) and at least one language-independent phonological factor (15). In a preferred approach, a unique transcription system serves to represent the phones using only standard, printable ASCII characters, none of which comprises a special character (such as those characters that have a command significance for common script interpreters such as the UNIX command line).
摘要:
A method (10) and system (200) for personalized voice dialogue can include tracking (12) a user's use of voice dialogue states or transitions and progressively offering (16) a user more efficient voice dialogue transitions or states such as voice dialogue transition or states having fewer and fewer words. The tracking of dialog states or transitions can include tracking (14) of repeated use of the dialogue states or transitions. A user can be prompted to create a new transition or state. The prompting (18) and confirmation and verification (20) by the user of a new transition or state can be done using SCXML language. The method can further include instantiating (21) the new transition or state with voice tags or words and performing (22) speech recognition using the new transition or state. The method can again determine (23) if the new transition or state is a repeat transition or state.
摘要:
A wireless transmitter (201) transmits (102) a message intended for at least one wireless personal communications device (202). That message comprises content (203) configured and arranged to at least attempt to prompt a particular operability configuration for the wireless personal communications device that conforms to social standards as correspond to a given local venue (204). Such content can vary with the application setting with some relevant examples comprising, but not being limited to, information indicative of a degree to which the operability configuration comprises a required operability configuration (as versus a voluntary or merely suggested configuration), information indicative of at least one particular capability of the wireless personal communication device to which the operability configuration pertains, and/or information corresponding to a time frame during which the operability configuration is applicable, to note but a few.
摘要:
A Higher Order Command Dialog System (HOCS) 250 for enabling voice control to a user interface is provided. The HOCS can record (302) a sequence of action steps a user performs while navigating a menu system to perform a task, prompt (304) a user to create an HOC for the task, and associate (306) the sequence of actions steps with a Higher Order Command (HOC) for performing the task. The HOC can include multi-modal inputs (120/260) and prompt a user for non-specific additional information (124) required in performing the task. The HOCS can store the HOC as a voice tag or a user-input command.