摘要:
A method and system are provided for text-to-speech synthesis with personalized voice. The method includes receiving an incidental audio input (403) of speech in the form of an audio communication from an input speaker (401) and generating a voice dataset (404) for the input speaker (401). The method includes receiving a text input (411) at the same device as the audio input (403) and synthesizing (312) the text from the text input (411) to synthesized speech including using the voice dataset (404) to personalize the synthesized speech to sound like the input speaker (401). In addition, the method includes analyzing (316) the text for expression and adding the expression (315) to the synthesized speech. The audio communication may be part of a video communication (453) and the audio input (403) may have an associated visual input (455) of an image of the input speaker. The synthesis from text may include providing a synthesized image personalized to look like the image of the input speaker with expressions added from the visual input (455).
摘要:
A method and system are provided for text-to-speech synthesis with personalized voice. The method includes receiving an incidental audio input (403) of speech in the form of an audio communication from an input speaker (401) and generating a voice dataset (404) for the input speaker (401). The method includes receiving a text input (411) at the same device as the audio input (403) and synthesizing (312) the text from the text input (411) to synthesized speech including using the voice dataset (404) to personalize the synthesized speech to sound like the input speaker (401). In addition, the method includes analyzing (316) the text for expression and adding the expression (315) to the synthesized speech. The audio communication may be part of a video communication (453) and the audio input (403) may have an associated visual input (455) of an image of the input speaker. The synthesis from text may include providing a synthesized image personalized to look like the image of the input speaker with expressions added from the visual input (455).
摘要:
A computer implemented method in a data processing system and a computer program product enable visual selection of a media signal. A set of media signals is received from a set of media providers. A subject matter and a performer of the subject matter are then identified for at least one of the set of media signals. A set of icons is then identified. Each of the set of icons corresponds to at least one of media signals. The set of icons and the set of media providers are then forwarded to a client media player.
摘要:
A computer implemented method in a data processing system and a computer program product enable visual selection of a media signal. A set of media signals is received from a set of media providers. A subject matter and a performer of the subject matter are then identified for at least one of the set of media signals. A set of icons is then identified. Each of the set of icons corresponds to at least one of media signals. The set of icons and the set of media providers are then forwarded to a client media player.
摘要:
Methods, apparatus and computer program products implement embodiments of the present invention that include receiving a time domain voice signal, and extracting a single pitch cycle from the received signal. The extracted single pitch cycle is transformed to a frequency domain, and the misclassified roots of the frequency domain are identified and corrected. Using the corrected roots, an indication of a maximum phase of the frequency domain is generated.
摘要:
Methods, apparatus and computer program products implement embodiments of the present invention that include receiving a time domain voice signal, and extracting a single pitch cycle from the received signal. The extracted single pitch cycle is transformed to a frequency domain, and the misclassified roots of the frequency domain are identified and corrected. Using the corrected roots, an indication of a maximum phase of the frequency domain is generated.
摘要:
Method, system, and computer program product for voice transformation are provided. The method includes transforming a source speech using transformation parameters, and encoding information on the transformation parameters in an output speech using steganography, wherein the source speech can be reconstructed using the output speech and the information on the transformation parameters. A method for reconstructing voice transformation is also provided including: receiving an output speech of a voice transformation system wherein the output speech is transformed speech which has encoded information on the transformation parameters using steganography; extracting the information on the transformation parameters; and carrying out an inverse transformation of the output speech to obtain an approximation of an original source speech.
摘要:
A method for processing a speech signal includes dividing the speech signal into a succession of frames, identifying one or more of the frames as click frames, and extracting phase information from the click frames. The speech signal is encoded using the phase information. Methods are also provided for modeling phase spectra of voiced frames and click frames.
摘要:
A method for processing a speech signal includes dividing the speech signal into a succession of frames, identifying one or more of the frames as click frames, and extracting phase information from the click frames. The speech signal is encoded using the phase information. Methods are also provided for modeling phase spectra of voiced frames and click frames.
摘要:
Method, system, and computer program product for voice transformation are provided. The method includes transforming a source speech using transformation parameters, and encoding information on the transformation parameters in an output speech using steganography, wherein the source speech can be reconstructed using the output speech and the information on the transformation parameters. A method for reconstructing voice transformation is also provided including: receiving an output speech of a voice transformation system wherein the output speech is transformed speech which has encoded information on the transformation parameters using steganography; extracting the information on the transformation parameters; and carrying out an inverse transformation of the output speech to obtain an approximation of an original source speech.