摘要:
According to one embodiment, an apparatus for synthesizing a speech, comprises an inputting unit configured to input a text sentence, a text analysis unit configured to analyze the text sentence so as to extract linguistic information, a parameter generation unit configured to generate a speech parameter by using the linguistic information and a pre-trained statistical parameter model, an embedding unit configured to embed information into the speech parameter, and a speech synthesis unit configured to synthesize the speech parameter with the information embedded by the embedding unit into a speech with the information.
摘要:
According to one embodiment, an apparatus for fusing voiced phoneme units in Text-To-Speech, includes a reference unit selection module configured to select a reference unit from the plurality of units based on pitch cycle information of the each unit and the number of pitch cycles of the target segment. The apparatus includes a template creation module configured to create a template based on the reference unit selected by the reference unit selection module and the number of pitch cycles of the target segment, wherein the number of pitch cycles of the template is same with that of pitch cycles of the target segment. The apparatus includes a pitch cycle alignment module configured to align pitch cycles of each unit of the plurality of units except the reference unit with pitch cycles of the template by using a dynamic programming algorithm.
摘要:
The present invention provides a method and apparatus for verification of speaker authentication. A method for verification of speaker authentication, comprising: inputting an utterance containing a password that is spoken by a speaker; extracting an acoustic feature vector sequence from said inputted utterance; DTW-matching said extracted acoustic feature vector sequence and a speaker template enrolled by an enrolled speaker; calculating each of a plurality of local distances between said DTW-matched acoustic feature vector sequence and said speaker template; nonlinear-transforming said each local distance calculated to give more weights on small local distances; calculating a DTW-matching score based on said plurality of local distances nonlinear-transformed; and comparing said matching score with a predefined discriminating threshold to determine whether said inputted utterance is an utterance containing a password spoken by the enrolled speaker.
摘要:
The present invention provides a method and apparatus for compressing a speaker template, a method and apparatus for merging a plurality of speaker templates, a method and apparatus for enrollment and verification of speaker authentication, a system for speaker authentication. Said method for compressing a speaker template that includes a plurality of feature vectors, comprising: designating a code to each of said plurality of feature vectors in said speaker template according to a codebook that includes a plurality of codes and their corresponding feature codes; and replacing a plurality of adjacent feature vectors designated with the same code in the speaker template with a feature vector.
摘要:
Dynamic features are utilized with CRFs to handle long-distance dependencies of output labels. The dynamic features present a probability distribution involved in explicit distance from/to a special output label that is pre-defined according to each application scenario. Besides the number of units in the segment (from the previous special output label to the current unit), the dynamic features may also include the sum of any basic features of units in the segment. Since the added dynamic features are involved in the distance from the previous specific label, the searching lattice associated with Viterbi searching is expanded to distinguish the nodes with various distances. The dynamic features may be used in a variety of different applications, such as Natural Language Processing, Text-To-Speech and Automatic Speech Recognition. For example, the dynamic features may be used to assist in prosodic break and pause prediction.
摘要:
The present invention provides a method and apparatus for enrollment and evaluation of speaker authentication. The method for enrollment of speaker authentication, comprising: generating a plurality of acoustic feature vector sequences respectively based on a plurality of utterances of the same content spoken by a speaker; generating a reference template from said plurality of acoustic feature vector sequences; generating a corresponding pseudo-impostor feature vector sequence for each of said plurality of acoustic feature vector sequences based on a code book that includes a plurality of codes and their corresponding feature vectors; and selecting an optimal acoustic feature subset based on said plurality of acoustic feature vector sequences, said reference template and said plurality of pseudo-impostor feature vector sequences.
摘要:
Dynamic features are utilized with CRFs to handle long-distance dependencies of output labels. The dynamic features present a probability distribution involved in explicit distance from/to a special output label that is pre-defined according to each application scenario. Besides the number of units in the segment (from the previous special output label to the current unit), the dynamic features may also include the sum of any basic features of units in the segment. Since the added dynamic features are involved in the distance from the previous specific label, the searching lattice associated with Viterbi searching is expanded to distinguish the nodes with various distances. The dynamic features may be used in a variety of different applications, such as Natural Language Processing, Text-To-Speech and Automatic Speech Recognition. For example, the dynamic features may be used to assist in prosodic break and pause prediction.
摘要:
The present invention provides a method and apparatus for enrollment and verification of speaker authentication. The method for enrollment of speaker authentication, comprising: extracting an acoustic feature vector sequence from an enrollment utterance of a speaker; and generating a speaker template using the acoustic feature vector sequence; wherein said step of extracting an acoustic feature vector sequence comprises: generating a filter-bank for the enrollment utterance of the speaker for filtering locations and energies of formants in the spectrum of the enrollment utterance based on the enrollment utterance; filtering the spectrum of the enrollment utterance by the generated filter-bank; and generating the acoustic feature vector sequence from the filtered enrollment utterance.
摘要:
The present invention provides a method and apparatus for enrollment and evaluation of speaker authentication, a method for estimating discriminating ability of a speech, and a system for speaker authentication. A method for enrollment of speaker authentication, comprising: inputting a speech containing a password that is spoken by a speaker; obtaining a phoneme sequence from said inputted speech; estimating discriminating ability of the phoneme sequence based on a discriminating ability table that includes a discriminating ability for each phoneme; setting a discriminating threshold for said speech; and generating a speech template for said speech.
摘要:
The present invention provides a method and apparatus for enrollment and verification of speaker authentication. The method for enrollment of speaker authentication, comprising: extracting an acoustic feature vector sequence from an enrollment utterance of a speaker; and generating a speaker template using the acoustic feature vector sequence; wherein said step of extracting an acoustic feature vector sequence comprises: generating a filter-bank for the enrollment utterance of the speaker for filtering locations and energies of formants in the spectrum of the enrollment utterance based on the enrollment utterance; filtering the spectrum of the enrollment utterance by the generated filter-bank; and generating the acoustic feature vector sequence from the filtered enrollment utterance.