Abstract:
Method, apparatus, and storage medium for voice imitation are provided. The voice imitation method, includes: separately obtaining a training voice of a source user and training voices of a plurality of imitation users including a target user; determining, according to the training voice of the source user and a training voice of the target user, a conversion rule for converting the training voice of the source user into the training voice of the target user; collecting voice information of the source user; and converting the voice information of the source user into an imitation voice of the target user according to the conversion rule.
Abstract:
A voice signal may be adjusted to mask traits such as the gender of a speaker by separating source and filter components of a voice signal using cepstral analysis, adjusting the components based on pitch and formant parameters, and synthesizing a modified signal. Features are disclosed to support real-time voice masking in a computer network by limiting computational complexity and reducing delays in processing and transmission while maintaining signal quality.
Abstract:
A method for voice modification during a telephone call comprising receiving a source audio signal associated with at least one participant, wherein the source audio signal comprises a voice of the at least one participant, detecting a source dialect of the at least one participant, selecting a target dialect based on at least a characteristic of a target participant and creating a modulated audio signal based on the source audio signal, the source dialect, and the target dialect and transmitting the modulated audio signal to the target participant.
Abstract:
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving an input from an agent during a call with a caller where the input directs one or more processors to inject a recorded statement in the agent's voice into the call, and where the recorded statement in the agent's voice is stored in a computer-readable file. Obtaining the recorded statement in the agent's voice based on data associated with the input and in response to receiving the input. And causing the recorded statement in the agent's voice to be inserted into a media stream of the call.
Abstract:
A method comprising: receiving an utterance, an original pitch contour of the utterance, and a target pitch contour for the utterance, wherein the utterance comprises a plurality of consecutive frames, and wherein at least one of said frames is a voiced frame; calculating an original intensity contour of said utterance; generating a pitch modified utterance based on the target pitch contour; calculating an intensity modification factor for each of said frames, based on said original pitch contour and on said target pitch contour, to produce a sequence of intensity modification factors corresponding to said plurality of consecutive frames; calculating a final intensity contour for said utterance by applying said intensity modification factors to said original intensity contour; and generating a coherently modified speech signal by time dependent scaling of the intensity of said pitch modified utterance according to said final intensity contour.
Abstract:
A text-to-speech method for simulating a plurality of different voice characteristics includes dividing inputted text into a sequence of acoustic units; selecting voice characteristics for the inputted text; converting the sequence of acoustic units to a sequence of speech vectors using an acoustic model having a plurality of model parameters provided in clusters each having at least one sub-cluster and describing probability distributions which relate an acoustic unit to a speech vector; and outputting the sequence of speech vectors as audio with the selected voice characteristics. A parameter of a predetermined type of each probability distribution is expressed as a weighted sum of parameters of the same type using voice characteristic dependent weighting. In converting the sequence of acoustic units to a sequence of speech vectors, the voice characteristic dependent weights for the selected voice characteristics are retrieved for each cluster such that there is one weight per sub-cluster.
Abstract:
The disclosure provides a customizable system for modifying voice dimensions. The system comprises a program interface located on an electronic device. The program interface is used to manipulate user input from one or more individuals relating to voice parameters. Instructions are then created by the program interface that allow for one or more individuals to modify the voice dimensions of the one or more individuals by following the instructions.The disclosure further provides a method for modifying an individual's voice dimensions. The method comprises identifying one or more dimensions in an individual's vocal dimensions that are to be modified. On an electronic device, a voice exercise is created by selecting at least one parameter that modifies the one or more dimensions in an individual's voice. Instructions created by the electronic device that are based on the selection of at least one parameter are then followed by the individual.
Abstract:
Modulating a voice signal is provided. The voice signal corresponding to a voice communication is received from a sending voice communication device via a network. Voice signal features corresponding to the voice communication are extracted. A set of voice signal filters are selected to modulate the extracted voice signal features corresponding to the voice communication to an average voice signal associated with a geographic area where the voice communication is destined for. The voice signal features corresponding to the voice communication are modulated by applying the selected set of voice signal filters to generate the average voice signal associated with the geographic area where the voice communication is destined for.
Abstract:
A device may receive data indicative of a plurality of speech sounds associated with first voice characteristics of a first voice. The device may receive an input indicative of speech associated with second voice characteristics of a second voice. The device may map at least one portion of the speech of the second voice to one or more speech sounds of the plurality of speech sounds of the first voice. The device may compare the first voice characteristics with the second voice characteristics based on the map. The comparison may include vocal tract characteristics, nasal cavity characteristics, and voicing characteristics. The device may determine a given representation configured to associate the first voice characteristics with the second voice characteristics. The device may provide an output indicative of pronunciations of the one or more speech sounds of the first voice according to the second voice characteristics based on the given representation.