摘要:
A biometrics security method and apparatus are disclosed that restrict the ability of a user to access a device or facility using a portion of biometric data to validate the user's identity. Upon a user request to access a secure device or facility, the central biometric security system initially sends a first request for a specific sample of a portion of the user's biometric information. The specific sample may be identified, for example, using a set of image coordinates. A second request is also sent to retrieve the biometric prototype from a database of registered users. The central biometric security system then compares the user biometrics portion with the corresponding biometrics prototype portions. The user receives access to the requested device if the user biometrics portion(s) matches the corresponding biometrics prototype portions. In one variation, the biometric security system transmits a security agent to the user's computing device upon a user request to access a remote device. The security agent serves to extract user biometric portions in accordance with the sampling request from the central biometric security system. In another variation, a local recognition is performed before a remote recognition to reduce the risk of a failed server side recognition due to a poor biometrics feature.
摘要:
A speech coding apparatus and method measures the values of at least first and second different features of an utterance during each of a series of successive time intervals. For each time interval, a feature vector signal has a first component value equal to a first weighted combination of the values of only one feature of the utterance for at least two time intervals. The feature vector signal has a second component value equal to a second weighted combination, different from the first weighted combination, of the values of only one feature of the utterance for at least two time intervals. The resulting feature vector signals for a series of successive time intervals form a coded representation of the utterance. In one embodiment, a first weighted mixture signal has a value equal to a first weighted mixture of the values of the features of the utterance during a single time interval. A second weighted mixture signal has a value equal to a second weighted mixture, different from the first weighted mixture, of the values of the features of the utterance during a single time interval. The first component value of each feature vector signal is equal to a first weighted combination of the values of only the first weighted mixture signals for at least two time intervals, and the second component value of each feature vector signal is equal to a second weighted combination, different from the first weighted combination, of the values of only the second weighted mixture for at least two time intervals.
摘要:
Disclosed is a method and apparatus for building a domain-specific language model for use in language processing applications, e.g., speech recognition. A reference language model is generated based on a relatively small seed corpus containing linguistic units relevant to the domain. An external corpus containing a large number of linguistic units is accessed. Using the reference language model, linguistic units which have a sufficient degree of relevance to the domain are extracted from the external corpus. The reference language model is then updated based on the seed corpus and the extracted linguistic units. The process may be repeated iteratively until the language model is of satisfactory quality. The language building technique may be further enhanced by combining it with mixture modeling or class-based modeling.
摘要:
A speech coding apparatus compares the closeness of the feature value of a feature vector signal of an utterance to the parameter values of prototype vector signals to obtain prototype match scores for the feature vector signal and each prototype vector signal. The speech coding apparatus stores a plurality of speech transition models representing speech transitions. At least one speech transition is represented by a plurality of different models. Each speech transition model has a plurality of model outputs, each comprising a prototype match score for a prototype vector signal. Each model output has an output probability. A model match score for a first feature vector signal and each speech transition model comprises the output probability for at least one prototype match score for the first feature vector signal and a prototype vector signal. A speech transition match score for the first feature vector signal and each speech transition comprises the best model match score for the first feature vector signal and all speech transition models representing the speech transition. The identification value of each speech transition and the speech transition match score for the first feature vector signal and each speech transition are output as a coded utterance representation signal of the first feature vector signal.
摘要:
A speech coding apparatus and method for use in a speech recognition apparatus and method. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. A plurality of prototype vector signals, each having at least one parameter value and a unique identification value are stored. The closeness of the feature vector signal is compared to the parameter values of the prototype vector signals to obtain prototype match scores for the feature value signal and each prototype vector signal. The identification value of the prototype vector signal having the best prototype match score is output as a coded representation signal of the feature vector signal. Speaker-dependent prototype vector signals are generated from both synthesized training vector signals and measured training vector signals. The synthesized training vector signals are transformed reference feature vector signals representing the values of features of one or more utterances of one or more speakers in a reference set of speakers. The measured training feature vector signals represent the values of features of one or more utterances of a new speaker/user not in the reference set.
摘要:
A system and method for providing automatic and coordinated sharing of conversational resources, e.g., functions and arguments, between network-connected servers and devices and their corresponding applications. In one aspect, a system for providing automatic and coordinated sharing of conversational resources includes a network having a first and second network device, the first and second network device each comprising a set of conversational resources, a dialog manager for managing a conversation and executing calls requesting a conversational service, and a communication stack for communicating messages over the network using conversational protocols, wherein the conversational protocols establish coordinated network communication between the dialog managers of the first and second network device to automatically share the set of conversational resources of the first and second network device, when necessary, to perform their respective requested conversational service.
摘要:
A method and apparatus for estimating the probability of phones, a-posteriori, in the context of not only the acoustic feature at that time, but also the acoustic features in the vicinity of the current time, and its use in cutting down the search-space in a speech recognition system. The method constructs and uses a decision tree, with the predictors of the decision tree being the vector-quantized acoustic feature vectors at the current time, and in the vicinity of the current time. The process starts with an enumeration of all (predictor, class) events in the training data at the root node, and successively partitions the data at a node according to the most informative split at that node. An iterative algorithm is used to design the binary partitioning. After the construction of the tree is completed, the probability distribution of the predicted class is stored at all of its terminal leaves. The decision tree is used during the decoding process by tracing a path down to one of its leaves, based on the answers to binary questions about the vector-quantized acoustic feature vector at the current time and its vicinity.
摘要:
Symbol feature values and contextual feature values of each event in a training set of events are measured. At least two pairs of complementary subsets of observed events are selected. In each pair of complementary subsets of observed events, one subset has contextual features with values in a set C.sub.n, and the other set has contextual features with values in a set C.sub.n, were the sets in C.sub.n and C.sub.n are complementary sets of contextual feature values. For each subset of observed events, the similarity values of the symbol features of the observed events in the subsets are calculated. For each pair of complementary sets of observed events, a "goodness of fit" is the sum of the symbol feature value similarity of the subsets. The sets of contextual feature values associated with the subsets of observed events having the best "goodness of fit" are identified and form context-dependent bases for grouping the observed events into two output sets.
摘要:
In a connection arrangement including two or more electronic devices, wherein information can be exchanged among the electronic devices through a plurality of communication links between the electronic devices, at least one of the electronic devices being configurable for communicating with a data source, a method for presenting a multi-channel message originating from the data source, the multi-channel message including a two or more components, includes the steps of allocating each of at least a portion of the components in the multi-channel message to at least one electronic device and, for each allocated component, determining possible communication paths between the data source and the at least one electronic device allocated to the corresponding component. The method further includes the steps of selecting, based at least in part on one or more selection criteria, at least one of the possible communication paths for the allocated components, each of the selected communication paths representing an optimal route between the data source and the at least one electronic device allocated to the corresponding component, and routing each of the allocated components in the multi-channel message according to the selected communication paths for presentation of the allocated components by the corresponding electronic device(s).
摘要:
A speech recognition apparatus and method estimates the next word context for each current candidate word in a speech hypothesis. An initial model of each speech hypothesis comprises a model of a partial hypothesis of zero or more words followed by a model of a candidate word. An initial hypothesis score for each speech hypothesis comprises an estimate of the closeness of a match between the initial model of the speech hypothesis and a sequence of coded representations of the utterance. The speech hypotheses having the best initial hypothesis scores form an initial subset. For each speech hypothesis in the initial subset, the word which is most likely to follow the speech hypothesis is estimated. A revised model of each speech hypothesis in the initial subset comprises a model of the partial hypothesis followed by a revised model of the candidate word. The revised candidate word model is dependent at least on the word which is estimated to be most likely to follow the speech hypothesis. A revised hypothesis score for each speech hypothesis in the initial subset comprises an estimate of the closeness of a match between the revised model of the speech hypothesis and the sequence of coded representations of the utterance. The speech hypotheses from the initial subset which have the best revised match scores are stored as a reduced subset. At least one word of one or more of the speech hypotheses in the reduced subset is output as a speech recognition result.