摘要:
In accordance with one embodiment of the present invention, unanticipated semantic intents are discovered in audio data in an unsupervised manner. For instance, the audio acoustics are clustered based on semantic intent and representative acoustics are chosen for each cluster. The human then need only listen to a small number of representative acoustics for each cluster (and possibly only one per cluster) in order to identify the unforeseen semantic intents.
摘要:
A method and apparatus are provided for training and using a hidden conditional random field model for speech recognition and phonetic classification. The hidden conditional random field model uses features, at least one of which is based on a hidden state in a phonetic unit. Values for the features are determined from a segment of speech, and these values are used to identify a phonetic unit for the segment of speech.
摘要:
A method of modeling a speech recognition system includes decoding a speech signal produced from a training text to produce a sequence of predicted speech units. The training text comprises a sequence of actual speech units that is used with the sequence of predicted speech units to form a confusion model. In further embodiments, the confusion model is used to decode a text to identify an error rate that would be expected if the speech recognition system decoded speech based on the text.
摘要:
A method and apparatus are provided for training parameters in a hidden conditional random field model for use in speech recognition and phonetic classification. The hidden conditional random field model uses parameterized features that are determined from a segment of speech, and those values are used to identify a phonetic unit for the segment of speech. The parameters are updated after processing of individual training samples.
摘要:
A computer-implemented method of indexing a speech lattice for search of audio corresponding to the speech lattice is provided. The method includes identifying at least two speech recognition hypotheses for a word which have time ranges satisfying a criteria. The method further includes merging the at least two speech recognition hypotheses to generate a merged speech recognition hypothesis for the word.
摘要:
A speech signal is decoded by determining a production-related value for a current state based on an optimal production-related value at the end of a preceding state, the optimal production-related value being selected from a set of continuous values. The production-related value is used to determine a likelihood of a phone being represented by a set of observation vectors that are aligned with a path between the preceding state and the current state. The likelihood of the phone is combined with a score from the preceding state to determine a score for the current state, the score from the preceding state being associated with a discrete class of production-related values wherein the class matches the class of the optimal production-related value.
摘要:
In one embodiment, a physical world tracking mechanism may monitor the efficacy of an advertisement with an offline conversion component. A data storage device 306 may store a commercial location 110 described in the advertisement and associate a conversion action with the advertisement. A processor 304 may register the conversion action at the commercial location 110 executed by a handheld computing device 104 of a user.
摘要:
A method and apparatus are provided for optimizing finite state machines with labeled nodes. Under the method, labels from the nodes are shifted onto the labels of the links connected to the nodes. The finite state machine is then optimized. After optimization, the labels on the links are examined to verify that the prefixes of the labels on each outgoing link match the suffixes of the labels on each incoming link to a particular node. After this verification, a portion of a label on a link is removed from the link and inserted onto the node.
摘要:
A method and apparatus are provided for optimizing finite state machines with labeled nodes. Under the method, labels from the nodes are shifted onto the labels of the links connected to the nodes. The finite state machine is then optimized. After optimization, the labels on the links are examined to verify that the prefixes of the labels on each outgoing link match the suffixes of the labels on each incoming link to a particular node. After this verification, a portion of a label on a link is removed from the link and inserted onto the node.