Abstract:
The present invention extends to methods, systems, and computer program products for interpreting expressions having potentially ambiguous meanings in different domains. Multi-domain natural language understanding systems can support a variety of different types of clients. Expressions can be interpreted across multiple domains. Weights can be assigned to domains. Weights can be client specific or expression specific so that a chosen interpretation is more likely correct for the type of client or for its context. Stored weight sets can be chosen according to identifying information carried as metadata with expressions or weight sets carried directly as metadata. Domains can additionally or alternatively be ranked in ordered lists or comparative domain pairs of to favor some domains over others as appropriate for client type or client context.
Abstract:
A dual mode speech recognition system sends speech to two or more speech recognizers. If a first recognition result is received, whose recognition score exceeds a high threshold, the first result is selected without waiting for another result. If the score is below a low threshold, the first result is ignored. At intermediate values of recognition scores, a timeout duration is dynamically determined as a function of the recognition score. The timeout duration determines how long the system will wait for another result. Many functions of the recognition score are possible, but timeout durations generally decrease as scores increase. When receiving a second recognition score before the timeout occurs, a comparison based on recognition scores determines whether the first result or the second result is the basis for creating a response.
Abstract:
An accurate thought map is created by recording people's many utterances of natural language expressions together with the location at which each expression was made. The expressions are input into a Natural Language Understanding system including a semantic parser, and the resulting interpretations stored in a database with the geolocation of the speaker. Emotions, concepts, time, user identification, and other interesting information may also be detected and stored. Interpretations of related expressions may be linked in the database. The database may be indexed and filtered according to multiple aspects of interpretations such as geolocation ranges, time ranges or other criteria, and analyzed according to multiple algorithms. The analyzed results may be used to render map displays, determine effective locations for advertisements, preemptively fetch information for users of mobile devices, and predict the behavior of individuals and groups of people.
Abstract:
Software-based systems perform parametric speech synthesis. TTS voice parameters determine the generated speech audio. Voice parameters include gender, age, dialect, donor, arousal, authoritativeness, pitch, range, speech rate, volume, flutter, roughness, breath, frequencies, bandwidths, and relative amplitudes of formants and nasal sounds. The system chooses TTS parameters based on one or more of: user profile attributes including gender, age, and dialect; situational attributes such as location, noise level, and mood; natural language semantic attributes such as domain of conversation, expression type, dimensions of affect, word emphasis and sentence structure; and analysis of target speaker voices. The system chooses TTS parameters to improve listener satisfaction or other desired listener behavior. Choices may be made by specified algorithms defined by code developers, or by machine learning algorithms trained on labeled samples of system performance.
Abstract:
A system and method is presented for performing dual mode speech recognition, employing a local recognition module on a mobile device and a remote recognition engine on a server device. The system accepts a spoken query from a user, and both the local recognition module and the remote recognition engine perform speech recognition operations on the query, returning a transcription and confidence score, subject to a latency cutoff time. If both sources successfully transcribe the query, then the system accepts the result having the higher confidence score. If only one source succeeds, then that result is accepted. In either case, if the remote recognition engine does succeed in transcribing the query, then a client vocabulary is updated if the remote system result includes information not present in the client vocabulary.
Abstract:
The present invention extends to methods, systems, and computer program products for a natural language module store. In general, the invention can be used to manage natural language modules offered through a natural language module store. Natural language module (NLM) developers can post NLMs at a NLM store to make the NLMs available for use by others. Developers can select NLMs for inclusion in natural language interpreters (NLIs) containing (and possibly integrating the functionality of) one or more NLMs. Prior to selecting a NLM, a developer can search or browse NLMs to identify an appropriate NLM. Optionally, a developer can test a NLM in the NLM store prior to inclusion in an NLI. For example, multiple NLMs purporting to provide the same specified natural language functionality can be tested relative to one another prior to selection of one of the NLMs for inclusion in an NLI.
Abstract:
A system and method are provided for adding user characterization information to a user profile by analyzing user's speech. User properties such as age, gender, accent, and English proficiency may be inferred by extracting and deriving features from user speech, without the user having to configure such information manually. A feature extraction module that receives audio signals as input extracts acoustic, phonetic, textual, linguistic, and semantic features. The module may be a system component independent of any particular vertical application or may be embedded in an application that accepts voice input and performs natural language understanding. A profile generation module receives the features extracted by the feature extraction module and uses classifiers to determine user property values based on the extracted and derived features and store these values in a user profile. The resulting profile variables may be globally available to other applications.
Abstract:
A method for matching a query against a broadcast stream includes receiving one or more broadcast streams, from which it generates and stores an audio fingerprint of a selected portion of each received broadcast stream. A query is received from which the method generates an audio fingerprint. From that point, the method continues by identifying audio content from the query, using the query audio fingerprint and a database of indexed audio content. The method concludes by identifying the source of the query using the query audio fingerprint and the stored audio fingerprints. Embodiments of the method further include predictively caching audio fingerprint sequences and corresponding audio item identifiers from a server after storing audio fingerprints extracted from the broadcast stream; and using the predictively cached audio fingerprint sequences to identify an audio item within the audio signal based on at least some additional audio fingerprints of the audio signal.
Abstract:
A system and method is presented for performing dual mode speech recognition, employing a local recognition module on a mobile device and a remote recognition engine on a server device. The system accepts a spoken query from a user, and both the local recognition module and the remote recognition engine perform speech recognition operations on the query, returning a transcription and confidence score, subject to a latency cutoff time. If both sources successfully transcribe the query, then the system accepts the result having the higher confidence score. If only one source succeeds, then that result is accepted. In either case, if the remote recognition engine does succeed in transcribing the query, then a client vocabulary is updated if the remote system result includes information not present in the client vocabulary.
Abstract:
A method for processing a voice message in a computerized system. The method receives and records a speech utterance including a message portion and a communication portion. The method proceeds to parse the input to identify and separate the message portion and the communication portion. It then identifies communication parameters, including one or more destination mailboxes, from the communication portion, and it transmits the message portion to the destination mailbox as a voice message.