Abstract:
A system and method is presented for performing dual mode speech recognition, employing a local recognition module on a mobile device and a remote recognition engine on a server device. The system accepts a spoken query from a user, and both the local recognition module and the remote recognition engine perform speech recognition operations on the query, returning a transcription and confidence score, subject to a latency cutoff time. If both sources successfully transcribe the query, then the system accepts the result having the higher confidence score. If only one source succeeds, then that result is accepted. In either case, if the remote recognition engine does succeed in transcribing the query, then a client vocabulary is updated if the remote system result includes information not present in the client vocabulary.
Abstract:
A method for processing a voice message in a computerized system. The method receives and records a speech utterance including a message portion and a communication portion. The method proceeds to parse the input to identify and separate the message portion and the communication portion. It then identifies communication parameters, including one or more destination mailboxes, from the communication portion, and it transmits the message portion to the destination mailbox as a voice message.
Abstract:
A system and method is presented for performing dual mode speech recognition, employing a local recognition module on a mobile device and a remote recognition engine on a server device. The system accepts a spoken query from a user, and both the local recognition module and the remote recognition engine perform speech recognition operations on the query, returning a transcription and confidence score, subject to a latency cutoff time. If both sources successfully transcribe the query, then the system accepts the result having the higher confidence score. If only one source succeeds, then that result is accepted. In either case, if the remote recognition engine does succeed in transcribing the query, then a client vocabulary is updated if the remote system result includes information not present in the client vocabulary.
Abstract:
A method for processing a natural language input to a computerized system. The method parses the input to identify a query portion and a communication portion of the input. The system then determines an answer to the query portion, including identifying communication parameters from the communication portion. Upon determining the answer, the system prepares an answer to the communication and transmits that answer. If the answer requires information from a remote source, the system creates a subsidiary query to obtain that information and then submits the subsidiary query to the remote source. A response to the query is used to compose the answer to the query from the answer to the subsidiary query. If the system concludes that the query portion does not require information from a remote source, analyzing and answering the query locally.
Abstract:
A method and system for assisting a traveler. The method initiates a travel segment and interfaces with providers of data regarding geolocation, points of interest, or traffic. During execution of the method, the system monitors geolocation data. The system can predict a route of travel, basing that prediction on past geolocation data or directions from a geolocation provider. Also during execution of the method, the system can identify data of interest to a user, based on input initiated by the user. Further, the system can solicit input from the traveler, or it can perform tasks based on predefined criteria, such as locating and identifying points of interest between two geographic locations. During operation, system can provide information to the traveler relating to data of interest. The system includes a client unit and a server unit, the client unit being mounted in an automotive vehicle or in a communications device.
Abstract:
A method of a local recognition system controlling a host device to perform one or more operations is provided. The method includes receiving, by the local recognition system, a query, performing speech recognition on the received query by implementing, by the local recognition system, a local language context comprising a set of words comprising descriptions in terms of components smaller than the words, and performing speech recognition, using the local language context, to create a transcribed query. Further, the method includes controlling the host device in dependence upon the speech recognition performed on the transcribed query.
Abstract:
A system and method for masking an identity of a speaker of natural language speech, such as speech clips to be labeled by humans in a system generating voice transcriptions for training an automatic speech recognition model. The natural language speech is morphed prior to being presented to the human for labeling. In one embodiment, morphing comprises pitch shifting the speech randomly either up or down, then frequency shifting the speech, then pitch shifting the speech in a direction opposite the first pitch shift. Labeling the morphed speech comprises at least one or more of transcribing the morphed speech, identifying a gender of the speaker, identifying an accent of the speaker, and identifying a noise type of the morphed speech.
Abstract:
A virtual assistant processes natural language expressions according to grammar rules created by domain providers. The virtual assistant uniquely identifies each of a multiplicity of users and stores values of grammar slots filled by natural language expressions from each user. The virtual assistant stores histories of slot values and computes statistics from the history. The virtual assistant provider, or a classification client, provides values of attributes of users as labels for a machine learning classification algorithm. The algorithm processes the grammar slot values and labels to compute probability distributions for unknown attribute values of users. A network effect of users and domain grammars make the virtual assistant useful and provides increasing amounts of data that improve classification accuracy and usefulness.
Abstract:
The application provides an apparatus, platform, method and medium for intention importance interference. The apparatus includes an interface configured to receive user-related information; and a processor coupled to the interface and configured to: extract data related to different aspects of a user from the user-related information; generate a plurality of intention probes based on the data related to different aspects of the user, each intention probe comprising an intention and associated data items; infer an importance of each intention probe by calculating a score of each associated data items of the intention probe based on the data related to different aspects of the user; and provide information associated with an intention probe with a highest importance.
Abstract:
A method and system for acoustic model conditioning on non-phoneme information features for optimized automatic speech recognition is provided. The method includes using an encoder model to encode sound embedding from a known key phrase of speech and conditioning an acoustic model with the sound embedding to optimize its performance in inferring the probabilities of phonemes in the speech. The sound embedding can comprise non-phoneme information related to the key phrase and the following utterance. Further, the encoder model and the acoustic model can be neural networks that are jointly trained with audio data.