摘要:
A method and apparatus for discriminative estimation of parameters in a maximum a posteriori (MAP) speaker adaptation condition, and a voice recognition apparatus having the apparatus and a voice recognition method using the method are provided. The method for discriminative estimation of parameters in a maximum a posteriori (MAP) speaker adaptation condition, in which at least speaker-independent model parameters and prior density parameters, which are standards in recognizing a speaker's voice, are obtained as the result of model training after fetching training sets on a plurality of speakers from a training database, has the steps of (a) classifying adaptation data among training sets for respective speakers; (b) obtaining model parameters adapted from adaptation data on each speaker by using the initial values of the parameters; (c) searching a plurality of candidate hypotheses on each uttered sentence of training sets by using the adapted model parameters, and calculating gradients of speaker-independent model parameters by measuring the degree of errors on each training sentence; and (d) when training sets of all speakers are adapted, updating parameters, which were set at the initial stage, based on the calculated gradients.
摘要:
An apparatus and method for transmitting sound are provided. The apparatus includes an external sound receiver for receiving external sounds and converting them into an external sound signal, a volume controller for outputting sound signals only if each of the volumes of the sound signal of a sound producing device and the sound signal of the external sound receiver exceeds a predetermined reference level, and a mixer for mixing the sound signal of the sound producing device with the sound signals output from the volume controller and outputting the result. The apparatus mixes ambient sounds having volume exceeding a certain volume with the sound of a sound producing device and transmits the mixed sounds to a pair of headphones which are a sound receiver for a user, thereby allowing the user to hear an ambient alarm sound while the user is listening to the sound of the sound producing device and making it possible for the user to audibly detect danger. Consequently, the apparatus provides user safety.
摘要:
A speaker verification system using the voice of a user uttering a continuous, random length digit string is provided. The speaker verification system includes a random digit generator for generating a continuous, random length digit string; a user interface for providing the continuous, random length digit string; a feature extractor for extracting voice features from the user's voice uttering the continuous, random length digit string; a digit voice verification unit for comparing the voice features with items in a speaker-independent continuous digit voice model to derive a digit string corresponding to items in the speaker-independent continuous digit voice model, which match the voice features, and for determining whether the derived digit string is identical to the digit string provided to the user via the user interface; and a speaker verification unit for comparing the voice features with a speaker-dependent model of the user to measure the similarity between them. The speaker-dependent model of the user includes previously determined features of the users' voice and determines whether to approve or reject the user based on the similarity.
摘要:
A human body communication system. The human body communication system includes a controlled device measuring a capacitance that corresponds to the distance to a human body, and transmitting information on the measured capacitance through a wireless medium; and a control device receiving the information, and then, based on the information, determining a transmission power and, with the determined transmission power, transmitting a control command of a user to the controlled device using the human body as a medium.
摘要:
A method, medium, and system for masking voice information of a communication device. The method of masking a user's voice through an output of a masking signal similar to a formant of voice data may include dividing the voice data received into frames of a predetermined size, transforming the frames on a frequency axis thereof, regarded as a domain, obtaining formant information of intensive signal regions in the transformed frames, generating a sound signal disturbing the formant information with reference to the formant information, and outputting the sound signal in accordance with a time point when the voice signal is output.
摘要:
A speech enhancement method, including the steps of: (a) segmenting an input speech signal into a plurality of frames and transforming each frame signal into a signal of the frequency domain; (b) computing the signal-to-noise ratio of a current frame, and computing signal-to-noise ratio of a frame immediately preceding the current frame; (c) computing the predicted signal-to-noise ratio of the current frame which is predicted based on the preceding frame and computing the speech absence probability using the signal-to-noise ratio and predicted signal-to-noise ratio of the current frame; (d) correcting the two signal-to-noise ratios obtained in the step (b) based on the speech absence probability computed in the step (c); (e) computing the gain of the current frame with the two corrected signal-to-noise ratios obtained in the step (d), and multiplying the speech spectrum of the current frame by the computed gain; (f) estimating the noise and speech power for the next frame to calculate the predicted signal-to-noise ratio for the next frame, and providing the predicted signal-to-noise ratio for the next frame as the predicted signal-to-noise ratio of the current frame for the step (c); and (g) transforming the result spectrum of the step (e) into a signal of the time domain. The noise spectrum is estimated in speech presence intervals based on the speech absence probability, as well as in speech absence intervals, and the predicted SNR and gain are updated on a per-channel basis of each frame according to the noise spectrum estimate, which in turn improves the speech spectrum in various noise environments.
摘要:
A speech signal encoding/decoding method is provided. The method of encoding LPC coefficients includes dividing the nth-order line spectral frequencies into lower, middle and upper code vectors, quantizing the middle code vectors using a middle code book to generate a first index, selecting one of a plurality of lower code books according to the lowermost line spectral frequency of the middle code vector and the line spectral frequencies of the lower code vectors, and quantizing the lower code vectors using the selected lower code book to generate a second index, selecting one of a plurality of upper code books according to the uppermost line spectral frequency of the middle code vector and the line spectral frequencies of the upper code vectors, quantizing the upper code vectors using the selected upper code book to generate a third index, and transmitting the first, second and third indexes. In the above quantization, the line spectral frequencies are quantized using a linked split vector quantization (LSVQ), and the search of the code book is efficiently performed, so that the spectral distortion and outlier percentages are lower at 23 bits/frame than those of the split vector quantization (SVQ) at 24 bits/frame.
摘要:
An apparatus and method of composing a web document and an apparatus to set a web document arrangement are provided. The apparatus to compose a web document includes a generation module which generates a plurality of frames by analyzing a source of a web document, a composition module which arranges the generated frames using a predetermined frame arrangement mode, and an output module which displays the arranged frames on a screen.
摘要:
When an error occurring in a terminal (device) is detected, one or more solutions for removing the detected error are displayed to a user, so that the detected error can be easily removed by the user even though the user is not familiar with operations or detailed functions of the camera.
摘要:
An apparatus and method of composing a web document and an apparatus to set a web document arrangement are provided. The apparatus to compose a web document includes a generation module which generates a plurality of frames by analyzing a source of a web document, a composition module which arranges the generated frames using a predetermined frame arrangement mode, and an output module which displays the arranged frames on a screen.