Abstract:
A hybrid approach is described for combining frequency warping and Gaussian Mixture Modeling (GMM) to achieve better speaker identity and speech quality. To train the voice conversion GMM model, line spectral frequency and other features are extracted from a set of source sounds to generate a source feature vector and from a set of target sounds to generate a target feature vector. The GMM model is estimated based on the aligned source feature vector and the target feature vector. A mixture specific warping function is generated each set of mixture mean pairs of the GMM model, and a warping function is generated based on a weighting of each of the mixture specific warping functions. The warping function can be used to convert sounds received from a source speaker to approximate speech of a target speaker.
Abstract:
A hybrid approach is described for combining frequency warping and Gaussian Mixture Modeling (GMM) to achieve better speaker identity and speech quality. To train the voice conversion GMM model, line spectral frequency and other features are extracted from a set of source sounds to generate a source feature vector and from a set of target sounds to generate a target feature vector. The GMM model is estimated based on the aligned source feature vector and the target feature vector. A mixture specific warping function is generated each set of mixture mean pairs of the GMM model, and a warping function is generated based on a weighting of each of the mixture specific warping functions. The warping function can be used to convert sounds received from a source speaker to approximate speech of a target speaker.
Abstract:
An apparatus for providing improved speech synthesis may include a processor and a memory storing executable instructions. In response to execution of the instructions by the processor, the apparatus may perform at least selecting a real glottal pulse from among one or more stored real glottal pulses based at least in part on a property associated with the real glottal pulse, utilizing the real glottal pulse selected as a basis for generation of an excitation signal, and modifying the excitation signal based on spectral parameters generated by a model to provide synthetic speech.
Abstract:
An apparatus for providing improved speech synthesis may include a processor and a memory storing executable instructions. In response to execution of the instructions by the processor, the apparatus may perform at least selecting a real glottal pulse from among one or more stored real glottal pulses based at least in part on a property associated with the real glottal pulse, utilizing the real glottal pulse selected as a basis for generation of an excitation signal, and modifying the excitation signal based on spectral parameters generated by a model to provide synthetic speech.
Abstract:
A voice conversion processing framework is operatively associated with the central processing unit (CPU) and audio processor of a communication device to convert default voice presentations generated by, for example text readers, ring tone applications and the like, to target voice presentations based on selected target voice files stored in memory.
Abstract:
Methods and apparatuses are provided for facilitating speech synthesis. A method may include generating a plurality of input models representing an input by using a statistical model synthesizer to statistically model the input. The method may further include determining a speech unit sequence representing at least a portion of the input by using the input models to influence selection of one or more pre-recorded speech units having parameter representations. The method may additionally include identifying one or more bad units in the unit sequence. The method may also include replacing the identified one or more bad units with one or more parameters generated by the statistical model synthesizer. Corresponding apparatuses are also provided.
Abstract:
Methods and apparatuses are provided for facilitating speech synthesis. A method may include generating a plurality of input models representing an input by using a statistical model synthesizer to statistically model the input. The method may further include determining a speech unit sequence representing at least a portion of the input by using the input models to influence selection of one or more pre-recorded speech units having parameter representations. The method may additionally include identifying one or more bad units in the unit sequence. The method may also include replacing the identified one or more bad units with one or more parameters generated by the statistical model synthesizer. Corresponding apparatuses are also provided.
Abstract:
A system and method that includes detecting an activation of an input application control key of a peripheral device, the peripheral device having a single application control input key, to open a initial application, the initial application becoming an active application, providing an identifier of the open application, and detecting a speech prompt to activate at least one function related to the active application.
Abstract:
In accordance with an example embodiment of the present invention, there is provided a method comprising receiving a first text input at a first point in time, providing a first completion candidate for the first text input, receiving a second text input at a second point in time, determining a time difference between the second point in time and the first point in time and providing a second completion candidate for the second text input based on at least the first completion candidate and the time difference.
Abstract:
A user interface provides for contextual navigation in locating desired content based on similarity/dissimilarity criteria. Each item that is identified in a device is provided with at least one multi-dimensional descriptor. A content of each item can be stored remotely from the device. A search criteria is selected that relates to the descriptor and a selected active item. A search is conducted to identify all other items identified in the device that have a relationship with the search criteria. The results are presented to the user and can be ranked according to a selected relationship order.