Abstract:
A telephone includes: a voice changer for converting the original voice of the user; an acquisition unit for acquiring information of the call-origin telephone at the time of an incoming call; a memory in which conditions for using the voice changer have been registered in advance; a determination unit for determining whether to use the voice changer based on information of the call-origin telephone that has been acquired by the acquisition unit and conditions that have been registered in the memory, and based on the result of determination, controlling the voice changer; and a switch unit for switching the state of use of the voice changer during a conversation.
Abstract:
In a voice synthesizer, an envelope acquisition portion obtains a spectral envelope of a reference frequency spectrum of a given voice. A spectrum acquisition portion obtains a collective frequency spectrum of a plurality of voices which are generated in parallel to one another. An envelope adjustment portion adjusts a spectral envelope of the collective frequency spectrum obtained by the spectrum acquisition portion so as to approximately match with the spectral envelope of the reference frequency spectrum obtained by the envelope acquisition portion. A voice generation portion generates an output voice signal from the collective frequency spectrum having the spectral envelope adjusted by the envelope adjustment portion.
Abstract:
A method for synthesizing speech from text includes receiving one or more waveforms characteristic of a voice of a person selected by a user, generating a personalized voice font based on the one or more waveforms, and delivering the personalized voice font to the user's computer, whereby speech can be synthesized from text, the speech being in the voice of the selected person, the speech being synthesized using the personalized voice font. A system includes a text-to-speech (TTS) application operable to generate a voice font based on speech waveforms transmitted from a client computer remotely accessing the TTS application.
Abstract:
A method and apparatus are provided for adjusting a content of an oral presentation provided by an agent of an organization and perceived by a human target of the organization based upon an objective of the organization. The method includes the steps of detecting a content of the oral presentation provided by the agent and modifying the oral presentation provided by the agent to produce the oral presentation perceived by the human target based upon the detected content and the organizational objective.
Abstract:
The intonation of speech is modified by an appropriate combination of resampling and time-domain harmonic scaling. Resampling increases (upsampling) or decreases (downsampling) the number of data points in a signal. Harmonic scaling adds or removes pitch cycles to or from a signal. The pitch of a speech signal can be increased by combining downsampling with harmonic scaling that adds an appropriate number of pitch cycles. Alternatively, pitch can be decreased by combining upsampling with harmonic scaling that removes an appropriate number of pitch cycles. The present invention can be implemented in an automated speech-therapy tool that is able to modify the intonation of prerecorded reference speech signals for playback to a user to emphasize the correct pronunciation by increasing the pitch of selected portions of words or phrases that the user had previously mispronounced.
Abstract:
An apparatus for providing a custom profile in a wireless device, and a method of modifying an audio profile in a wireless device, are disclosed. The apparatus includes a memory into which at least one criterion is entered by the user, a receiver that receives an audio signal, a comparator that receives the audio signal from the receiver, and that receives at least a first of the least one criterion from the memory, and that compares the audio signal to the first criterion, and an adjustor that adjusts the audio signal based on the result from the comparator. The method includes the steps of entering, by a user of the wireless device, of a first criterion, comparing an audio signal received by the wireless device to the first criterion, adjusting the audio signal based on the output of the comparing step, and playing the adjusted audio signal to the user, or broadcasting the adjusted audio signal to a remote caller.
Abstract:
The speech synthesizer is personalized to sound like or mimic the speech characteristics of an individual speaker. The individual speaker provides a quantity of enrollment data, which can be extracted from a short quantity of speech, and the system modifies the base synthesis parameters to more closely resemble those of the new speaker. More specifically, the synthesis parameters may be decomposed into speaker dependent parameters, such as context-independent parameters, and speaker independent parameters, such as context dependent parameters. The speaker dependent parameters are adapted using enrollment data from the new speaker. After adaptation, the speaker dependent parameters are combined with the speaker independent parameters to provide a set of personalized synthesis parameters. To adapt the parameters with a small amount of enrollment data, an eigenspace is constructed and used to constrain the position of the new speaker so that context independent parameters not provided by the new speaker may be estimated.
Abstract:
There is provided a voice processing technology and device utilizing the voice processing technology, capable of making the voice on the telephone receiver easy to hear according to the frequency. This invention utilizes a frequency conversion means capable of converting the input voice frequency to a desired frequency. The frequency conversion level or frequency shift can be optionally set for each frequency. The frequency is not changed if the input voice frequency is low. The higher the voice frequency, the greater the frequency conversion shift to a low frequency range for an easy to hear voice.
Abstract:
A technique for separating an acoustic signal into a voiced (V) component corresponding to an electrolaryngeal source and an unvoiced (U) component corresponding to a turbulence source. The technique can be used to improve the quality of electrolaryngeal speech, and may be adapted for use in a special purpose telephone. A method according to the invention extracts a segment of consecutive values from the original stream of numerical values, and performs a discrete Fourier transform on the this first group of values. Next, a second group of values is extracted from components of the discrete Fourier transform result which correspond to an electrolaryngeal fixed repetition rate, F0, and harmonics thereof. An inverse-Fourier transform is applied to the second group of values, to produce a representation of a segment of the V component. Multiple V component segments are then concatenated to form a V component sample stream. Finally, the U component is determined by subtracting the V component sample stream from the original stream of numerical values.
Abstract:
At a smoothing spectrogram calculation portion, a triangular interpolation function having a frequency width twice that of the fundamental frequency of a signal is obtained based on information on the fundamental frequency of the signal. The interpolation function and a spectrum obtained at an adaptive frequency analysis portion are convoluted in the direction of frequency. Then, using a triangular interpolation function having a time length twice that of a fundamental period, the spectrum interpolated in the frequency direction described above is further interpolated in the temporal direction, in order to produce a smoothed spectrogram having the space between grid points on the time-frequency plane filled with the surface of a bilinear function. Using the smoothed spectrogram, a speech sound is transformed. Therefore, the influence of periodicity in the frequency direction and the temporal direction can be reduced.