摘要:
A system for generating high-quality synthesized text-to-speech includes a learning data generating unit, a frequency data generating unit, and a setting unit. The learning data generating unit recognizes inputted speech, and then generates first learning data in which wordings of phrases are associated with readings thereof. The frequency data generating unit generates, based on the first learning data, frequency data indicating appearance frequencies of both wordings and readings of phrases. The setting unit sets the thus generated frequency data for a language processing unit in order to approximate outputted speech of text-to-speech to the inputted speech. Furthermore, the language processing unit generates, from a wording of text, a reading corresponding to the wording, on the basis of the appearance frequencies.
摘要:
A system for generating high-quality synthesized text-to-speech includes a learning data generating unit, a frequency data generating unit, and a setting unit. The learning data generating unit recognizes inputted speech, and then generates first learning data in which wordings of phrases are associated with readings thereof. The frequency data generating unit generates, based on the first learning data, frequency data indicating appearance frequencies of both wordings and readings of phrases. The setting unit sets the thus generated frequency data for a language processing unit in order to approximate outputted speech of text-to-speech to the inputted speech. Furthermore, the language processing unit generates, from a wording of text, a reading corresponding to the wording, on the basis of the appearance frequencies.
摘要:
Training wording data indicating the wording of each of the words in training text, training speech data indicating characteristics of speech of each of the words, and training boundary data indicating whether each word in training speech is a boundary of a prosodic phrase are stored. After inputting candidates for boundary data, a first likelihood that each of the a boundary of a prosodic phrase of the words in the inputted text would agree with one of the inputted boundary data candidates is calculated and a second likelihood is calculated. Thereafter, one boundary data candidate maximizing a product of the first and second likelihoods is searched out from among the inputted boundary data candidates, and then a result of the searching is outputted.
摘要:
A system, method, and computer readable article of manufacture for extracting a specific situation in a conversation. The system includes: an acquisition unit for acquiring speech voice data of speakers in the conversation; a specific expression detection unit for detecting the speech voice data of a specific expression from speech voice data of a specific speaker in the conversation; and a specific situation extraction unit for extracting, from the speech voice data of the speakers in the conversation, a portion of the speech voice data that forms a speech pattern that includes the speech voice data of the specific expression detected by the specific expression detection unit.
摘要:
For design of a speech interface accepting speech control options, speech samples are stored on a computer-readable medium. A similarity calculating unit calculates a certain indication of similarity of first and second sets of ones of the speech samples, the first set of speech samples being associated with a first speech control option and the second set of speech samples being associated with a second speech control option. A display unit displays the similarity indication.In another aspect, word vectors are generated for the respective speech sample sets, indicating frequencies of occurrence of respective words in the respective speech sample sets. The similarity calculating unit calculates the similarity indication responsive to the word vectors of the respective speech sample sets.In another aspect, a perplexity indication is calculated for respective speech sample sets responsive to language models for the respective speech sample sets.
摘要:
For design of a speech interface accepting speech control options, speech samples are stored on a computer-readable medium. A similarity calculating unit calculates a certain indication of similarity of first and second sets of ones of the speech samples, the first set of speech samples being associated with a first speech control option and the second set of speech samples being associated with a second speech control option. A display unit displays the similarity indication.In another aspect, word vectors are generated for the respective speech sample sets, indicating frequencies of occurrence of respective words in the respective speech sample sets. The similarity calculating unit calculates the similarity indication responsive to the word vectors of the respective speech sample sets.In another aspect, a perplexity indication is calculated for respective speech sample sets responsive to language models for the respective speech sample sets.
摘要:
A system, method, and program product for processing voice data in a conversation between two persons to determine characteristic conversation patterns. The system includes: a variation calculator for calculating a variation of a speech ratio of a first speaker and a variation calculator for calculating a variation of a speech ratio of a second speaker; a difference calculator for calculating a difference data string; a smoother for generating a smoothed difference data string; and a presenter for presenting the difference between the variation of the speech ratio of the first speaker and the speech ratio of the second speaker. The method includes: calculating a variation of a speech ratio of a first speaker and a second speaker; calculating a difference data string; generating a smoothed difference data string; and grouping them according to their patterns.
摘要:
A system, method, and computer readable article of manufacture for extracting a specific situation in a conversation. The system includes: an acquisition unit for acquiring speech voice data of speakers in the conversation; a specific expression detection unit for detecting the speech voice data of a specific expression from speech voice data of a specific speaker in the conversation; and a specific situation extraction unit for extracting, from the speech voice data of the speakers in the conversation, a portion of the speech voice data that forms a speech pattern that includes the speech voice data of the specific expression detected by the specific expression detection unit.
摘要:
Techniques for acquiring, from an input text and an input speech, a set of a character string and a pronunciation thereof which should be recognized as a word. A system according to the present invention: selects, from an input text, plural candidate character strings which are candidates to be recognized as a word; generates plural pronunciation candidates of the selected candidate character strings; generates frequency data by combining data in which the generated pronunciation candidates are respectively associated with the character strings; generates recognition data in which character strings respectively indicating plural words contained in the input speech are associated with pronunciations; and selects and outputs a combination contained in the recognition data, out of combinations each consisting of one of the candidate character strings and one of the pronunciation candidates.
摘要:
For design of a speech interface accepting speech control options, speech samples are stored on a computer-readable medium. A similarity calculating unit calculates a certain indication of similarity of first and second sets of ones of the speech samples, the first set of speech samples being associated with a first speech control option and the second set of speech samples being associated with a second speech control option. A display unit displays the similarity indication.In another aspect, word vectors are generated for the respective speech sample sets, indicating frequencies of occurrence of respective words in the respective speech sample sets. The similarity calculating unit calculates the similarity indication responsive to the word vectors of the respective speech sample sets.In another aspect, a perplexity indication is calculated for respective speech sample sets responsive to language models for the respective speech sample sets.