摘要:
Methods and apparatuses are provided for user interest modeling. A method may include receiving an input from a user for specifying one or more topics from among a predetermined hierarchy of topics and subtopics. The method may additionally include retrieving one or more documents associated with the user and extracting language tokens from the documents based, at least in part, on the specified topics. Corresponding apparatuses are also provided.
摘要:
Various embodiments include a method including receiving media data on an apparatus, and receiving one or more context attributes related to the apparatus or accessed by the apparatus. The method further includes determining whether the one or more context attributes relate to the media data, and causing, at least in part, display of the media data with the one or more context attributes that are determined to relate to the media data. Also, a method is provided that includes receiving media data on an apparatus, parsing the media data into one or more structured elements, determining one or more informational links that relate to the one or more structured elements of the media data, and causing, at least in part, display of the media data with the one or more informational links that are determined to relate to the one or more structured elements.
摘要:
A method of multi-lingual speech recognition can include determining whether characters in a word are in a source list of a language-specific alphabet mapping table for a language, converting each character not in the source list according to a general alphabet mapping table, converting each converted character according to the language-specific alphabet mapping table, verifying that each character in the word is in a character set of the language, removing characters not in the character set of the language, and identifying a pronunciation of the word.
摘要:
An apparatus for providing data clustering and mode selection includes a training element and a transformation element. The training element is configured to receive a first training data set, a second training data set and auxiliary data extracted from the same material as the first training data set. The training element is also configured to train a classifier to group the first training data set into M clusters based on the auxiliary data and the first training data set and train M processing schemes corresponding to the M clusters for transforming the first training data set into the second training data set. The transformation element is in communication with the training element and is configured to cluster the second training data set into M clusters based on features associated with the second training data set.
摘要:
An apparatus may include a processor configured to receive vocabulary entry data. The processor may be further configured to determine a class for the received vocabulary entry data. The processor may be additionally configured to identify one or more languages for the vocabulary entry data based upon the determined class. The processor may also be configured to generate a phoneme sequence for the vocabulary entry data for each identified language. Corresponding methods and computer program products are also provided.
摘要:
A hybrid approach is described for combining frequency warping and Gaussian Mixture Modeling (GMM) to achieve better speaker identity and speech quality. To train the voice conversion GMM model, line spectral frequency and other features are extracted from a set of source sounds to generate a source feature vector and from a set of target sounds to generate a target feature vector. The GMM model is estimated based on the aligned source feature vector and the target feature vector. A mixture specific warping function is generated each set of mixture mean pairs of the GMM model, and a warping function is generated based on a weighting of each of the mixture specific warping functions. The warping function can be used to convert sounds received from a source speaker to approximate speech of a target speaker.
摘要:
An apparatus for providing efficient evaluation of feature transformation includes a training module and a transformation module. The training module is configured to train a Gaussian mixture model (GMM) using training source data and training target data. The transformation module is in communication with the training module. The transformation module is configured to produce a conversion function in response to the training of the GMM. The training module is further configured to determine a quality of the conversion function prior to use of the conversion function by calculating a trace measurement of the GMM.
摘要:
An apparatus for providing voice conversion using temporal dynamic features includes a feature extractor and a transformation element. The feature extractor may be configured to extract dynamic feature vectors from source speech. The transformation element may be in communication with the feature extractor and configured to apply a first conversion function to a signal including the extracted dynamic feature vectors to produce converted dynamic feature vectors. The first conversion function may have been trained using at least dynamic feature data associated with training source speech and training target speech. The transformation element may be further configured to produce converted speech based on an output of applying the first conversion function.
摘要:
It may be desirable to provide a way to collect high quality speech training data without undue burden to the user. Speech training data may be collected during normal usage of a device. In this way, the collection of speech training data may be effectively transparent to the user, without the need for a distinct collection mode from the user's point of view. For example, where the device is or includes a phone (such as a cellular phone), when the user makes or receives a phone call to/from another party, speech training data may be automatically collected from one or both of the parties during the phone call.
摘要:
A method of providing content dependent media content mixing includes automatically determining an emotional property of a first media content input, determining a specification for a second media content in response to the determined emotional property, and producing the second media content in accordance with the specification.