摘要:
A method of text script generation for a corpus-based text-to-speech system includes searching in a source corpus having L sentences, selecting N sentences with a best integrated efficiency as N best cases, and setting iteration k to be 1; for each case n of the N best cases, selecting Mk+1 best sentences with the best integrated efficiency from the unselected sentences in the source corpus; keeping N best cases out of the total unselected sentences for next iteration, and increasing iteration k by 1; and if a termination criterion being reached, setting the best case in the N traced cases as the text script, otherwise, returning to the (k+1)th iteration of searching in the unselected sentences for (k+1)th sentence; wherein the best integrated efficiency depends on a function of combining the covering rate of the synthesis unit type, the hit rate of the synthesis unit type, and the text script size.
摘要:
An automatic speech segmentation and verification system and method is disclosed, which has a known text script and a recorded speech corpus corresponding to the known text script. A speech unit segmentor segments the recorded speech corpus into N test speech unit segments referring to the phonetic information of the known text script. Then, a segmental verifier is applied to obtain a confidence measure of syllable segmentation for verifying the correctness of the cutting points of test speech unit segments. A phonetic verifier obtains a confidence measure of syllable verification by using verification models for verifying whether the recorded speech corpus is correctly recorded. Finally, a speech unit inspector integrates the confidence measure of syllable segmentation and the confidence measure of syllable verification to determine whether the test speech unit segment is accepted or not.
摘要:
A method for speech quality degradation estimation, a method for degradation measures calculation, and the apparatuses thereof are provided. The first method above estimates the speech quality of a speech signal that is modified by a pitch-synchronous prosody modification method, which comprises the following steps. First, extract at least one source pitchmark from the speech signal, and then maps the source pitchmark(s) to at least one target pitchmark(s). Finally, calculate at least one degradation measure based on the mapping between the source and the target pitchmarks. The degradation measures include several weighted pitch-related functions and duration-related functions, where the weighting functions can be calculated based on the speech signal or the pitchmark(s) mapping mentioned above.
摘要:
A system and method for providing mobile information, a server and a portable device therein are provided. The server comprises an intelligent download manager and the portable device comprises a file browse manager. The intelligent download manager determines a downloaded file update rule and a file browse rule according to any combination of a document attribute, a browse record, and a document preference. The files to be downloaded to the portable device can be determined automatically according to the downloaded file update rule. The file browse manager provides an intelligent browse mode related to the browse sequence of the downloaded files according to the file browse rule. Therefore, information really interesting to the user can be stored in the limited space by the present invention and the user can access the information quickly and efficiently.
摘要:
A method and system for pronunciation assessment based on distinctive feature analysis is provided. It evaluates a user's pronunciation by one or more distinctive feature (DF) assessor. It may further construct a phone assessor with DF assessors to evaluate a user's phone pronunciation, and even construct a continuous speech pronunciation assessor with phone assessor to get the final pronunciation score for a word or a sentence. Each DF assessor further includes a feature extractor and a distinctive feature classifier, and can be realized differently. This is based on the different characteristic of the distinctive feature. A score mapper may be included to standardize the output for each DF assessor. Each speech phone can be described as a “bundle” of DFs. The invention is a novel and qualitative solution based on the DF of speech sounds for pronunciation assessment.
摘要:
An improved method for shifting the pitches of a tone is disclosed. It comprises: (a) subjecting a digitized original waveform to a whitening process using an all-zero filter (AZF) to obtain a whitened waveform; (b) resampling the whitened waveform at a desired scaling ratio to obtain a scaled and whitened waveform; (c) subjecting the scaled and whitened waveform to a coloring process using an all-pole filter (APF) to obtain a synthesized waveform. In a preferred embodiment, the all-zero filter performs the transformation function of: ##EQU1## and the all-pole filter performs the transformation function of: ##EQU2## wherein the a.sub.i 's and b.sub.i 's are linear predictive coefficients. The whitened waveforms can be compressed and stored as wavetables, which can be subsequently retrieved and decompressed before resampling.
摘要:
A multi-lingual text-to-speech system and method processes a text to be synthesized via an acoustic-prosodic model selection module and an acoustic-prosodic model mergence module, and obtains a phonetic unit transformation table. In an online phase, the acoustic-prosodic model selection module, according to the text and a phonetic unit transcription corresponding to the text, uses at least a set controllable accent weighting parameter to select a transformation combination and find a second and a first acoustic-prosodic models. The acoustic-prosodic model mergence module merges the two acoustic-prosodic models into a merged acoustic-prosodic model, according to the at least a controllable accent weighting parameter, processes all transformations in the transformation combination and generates a merged acoustic-prosodic model sequence. A speech synthesizer and the merged acoustic-prosodic model sequence are further applied to synthesize the text into an L1-accent L2 speech.
摘要:
A speech synthesizer generating system and a method thereof are provided. A speech synthesizer generator in the speech synthesizer generating system automatically generates a speech synthesizer conforming to a speech output specification input by a user. In addition, a recording script is automatically generated by a recording script generator in the speech synthesizer generating system according to the speech output specification, and a customized or expanded speech material is recorded according to the recording script. After the speech material is uploaded to the speech synthesizer generating system, the speech synthesizer generator automatically generates a speech synthesizer conforming to the speech output specification. The speech synthesizer then synthesizes and outputs a speech output at a user end.
摘要:
A method for speech quality degradation estimation, a method for degradation measures calculation, and the apparatuses thereof are provided. The first method above estimates the speech quality of a speech signal that is modified by a pitch-synchronous prosody modification method, which comprises the following steps. First, extract at least one source pitchmark from the speech signal, and then maps the source pitchmark(s) to at least one target pitchmark(s). Finally, calculate at least one degradation measure based on the mapping between the source and the target pitchmarks. The degradation measures include several weighted pitch-related functions and duration-related functions, where the weighting functions can be calculated based on the speech signal or the pitchmark(s) mapping mentioned above.