摘要:
There is provided a waveform processing device for changing power of each pitch waveform of a segment in order to acquire a natural synthesis speech. A power calculation means 71 selects pitch waveforms one by one from a group of pitch waveforms corresponding to a segment, and calculates a scalar indicating power of a selected pitch waveform. A normalization degree calculation means 72 calculates a degree of normalization which is an index indicating a degree of normalization of a pitch waveform selected by the power calculation means 71, as a function value of an increasing function using the scalar as a variable. A change coefficient calculation means 73 calculates a change coefficient for changing an amplitude value of a pitch waveform selected by the power calculation means 71 based on the scalar and the degree of normalization. An amplitude change means 74 multiplies an amplitude value at each sampling point of a pitch waveform selected by the power calculation means 71 by the change coefficient.
摘要:
Disclosed is a speech synthesizing apparatus including a segment selection unit that selects a segment suited to a target segment environment from candidate segments, includes a prosody change amount calculation unit that calculates prosody change amount of each candidate segment based on prosody information of candidate segments and the target segment environment, a selection criterion calculation unit that calculates a selection criterion based on the prosody change amount, a candidate selection unit that narrows down selection candidates based on the prosody change amount and the selection criterion, and an optimum segment search unit than searches for an optimum segment from among the narrowed-down candidate segments.
摘要:
There is provided a prosody generator that generates prosody information for implementing highly natural speech synthesis without unnecessarily collecting large quantities of learning data. A data dividing means 81 divides into subspaces the data space of a learning database as an assembly of learning data indicative of the feature quantities of speech waveforms. A density information extracting means 82 extracts density information indicative of the density state in terms of information quantity of the learning data in each of the subspaces divided by the data dividing means 81. A prosody information generating method selecting means 83 selects either a first method or a second method as a prosody information generating method based on the density information, the first method involving generating the prosody information using a statistical technique, the second method involving generating the prosody information using rules based on heuristics.
摘要:
A speech synthesis device is provided with: a central segment selection unit for selecting a central segment from among a plurality of speech segments; a prosody generation unit for generating prosody information based on the central segment; a non-central segment selection unit for selecting a non-central segment, which is a segment outside of a central segment section, based on the central segment and the prosody information; and a waveform generation unit for generating a synthesized speech waveform based on the prosody information, the central segment, and the non-central segment. The speech synthesis device first selects a central segment that forms a basis for prosody generation and generates prosody information based on the central segment so that it is possible to sufficiently reduce both concatenation distortion and sound quality degradation accompanying prosody control in the section of the central segment.
摘要:
A speech synthesis device is provided with: a central segment selection unit for selecting a central segment from among a plurality of speech segments; a prosody generation unit for generating prosody information based on the central segment; a non-central segment selection unit for selecting a non-central segment, which is a segment outside of a central segment section, based on the central segment and the prosody information; and a waveform generation unit for generating a synthesized speech waveform based on the prosody information, the central segment, and the non-central segment. The speech synthesis device first selects a central segment that forms a basis for prosody generation and generates prosody information based on the central segment so that it is possible to sufficiently reduce both concatenation distortion and sound quality degradation accompanying prosody control in the section of the central segment.
摘要:
Disclosed is a speech synthesizing apparatus including a segment selection unit that selects a segment suited to a target segment environment from candidate segments, includes a prosody change amount calculation unit that calculates prosody change amount of each candidate segment based on prosody information of candidate segments and the target segment environment, a selection criterion calculation unit that calculates a selection criterion based on the prosody change amount, a candidate selection unit that narrows down selection candidates based on the prosody change amount and the selection criterion, and an optimum segment search unit than searches for an optimum segment from among the narrowed-down candidate segments.
摘要:
There is provided a prosody generator that generates prosody information for implementing highly natural speech synthesis without unnecessarily collecting large quantities of learning data. A data dividing means 81 divides into subspaces the data space of a learning database as an assembly of learning data indicative of the feature quantities of speech waveforms. A density information extracting means 82 extracts density information indicative of the density state in terms of information quantity of the learning data in each of the subspaces divided by the data dividing means 81. A prosody information generating method selecting means 83 selects either a first method or a second method as a prosody information generating method based on the density information, the first method involving generating the prosody information using a statistical technique, the second method involving generating the prosody information using rules based on heuristics.
摘要:
This speech synthesis system includes a server device and a client device. The client device accepts text information representing text, and transmits a speech element request to the server device. The server device stores speech element information. The server device receives the speech element request transmitted by the client device and, in response to the received speech element request, transmits speech element information to the client device so that the speech element information is received by the client device in a different order from an order of arrangement of speech elements in speech corresponding to the text. The client device executes a speech synthesis process by rearranging the speech element information so that speech elements represented by the received speech element information are arranged in the same order as the order of arrangement of the speech elements in the speech corresponding to the text.
摘要:
There is provided a waveform processing device for changing power of each pitch waveform of a segment in order to acquire a natural synthesis speech. A power calculation means 71 selects pitch waveforms one by one from a group of pitch waveforms corresponding to a segment, and calculates a scalar indicating power of a selected pitch waveform. A normalization degree calculation means 72 calculates a degree of normalization which is an index indicating a degree of normalization of a pitch waveform selected by the power calculation means 71, as a function value of an increasing function using the scalar as a variable. A change coefficient calculation means 73 calculates a change coefficient for changing an amplitude value of a pitch waveform selected by the power calculation means 71 based on the scalar and the degree of normalization. An amplitude change means 74 multiplies an amplitude value at each sampling point of a pitch waveform selected by the power calculation means 71 by the change coefficient.
摘要:
This speech synthesis system includes a server device and a client device. The client device accepts text information representing text, and transmits a speech element request to the server device. The server device stores speech element information. The server device receives the speech element request transmitted by the client device and, in response to the received speech element request, transmits speech element information to the client device so that the speech element information is received by the client device in a different order from an order of arrangement of speech elements in speech corresponding to the text. The client device executes a speech synthesis process by rearranging the speech element information so that speech elements represented by the received speech element information are arranged in the same order as the order of arrangement of the speech elements in the speech corresponding to the text.