摘要:
An impulse event separating method, and an apparatus to perform the method, the method including dividing an input signal into frame units and dividing each frame into a plurality of frequency sub-bands; obtaining a power variation and phase variation of the signal of each of the frequency sub-bands, and detecting a plurality of local onsets using the power variation and the phase variation; obtaining a global onset from the local onsets and triggering a plurality of event components using the local onsets and the global onset; tracking and combining the event components in each of the frequency sub-bands to form events; and determining whether the events comprise an impulse event with reference to an impulse event property.
摘要:
An impulse event separating method, and an apparatus to perform the method, the method including dividing an input signal into frame units and dividing each frame into a plurality of frequency sub-bands; obtaining a power variation and phase variation of the signal of each of the frequency sub-bands, and detecting a plurality of local onsets using the power variation and the phase variation; obtaining a global onset from the local onsets and triggering a plurality of event components using the local onsets and the global onset; tracking and combining the event components in each of the frequency sub-bands to form events; and determining whether the events comprise an impulse event with reference to an impulse event property.
摘要:
A pitch estimating method and apparatus in which mixture Gaussian distributions based on candidate pitches having high period estimating values are generated, a mixture Gaussian distribution having a high likelihood is selected and dynamic programming is executed so that the pitch of the speech signal can be accurately estimated. The pitch estimating method comprises computing a normalized autocorrelation function of a windowed signal obtained by multiplying a frame of a speech signal by a window signal and determining candidate pitches from a peak value of the normalized autocorrelation function of the windowed signal, interpolating a period of the determined candidate pitches and a period estimating value representing a length of the period, generating Gaussian distributions for the candidate pitches for each frame for which the interpolated period estimating value is greater than a first threshold value, mixing the Gaussian distributions which are located at a distance less than a second threshold value to generate mixture Gaussian distributions and selecting at least one of the mixture Gaussian distributions that a likelihood exceeding a third threshold value, and executing dynamic programming for the frames to estimate the pitch of each frame, based on the candidate pitches of each of the frames and the selected mixture Gaussian distributions.
摘要:
An apparatus, method, and medium for distinguishing a vocal sound. The apparatus includes: a framing unit dividing an input signal into frames, each frame having a predetermined length; a pitch extracting unit determining whether each frame is a voiced frame or an unvoiced frame and extracting a pitch contour from the voiced and unvoiced frames; a zero-cross rate calculator respectively calculating a zero-cross rate for each frame; a parameter calculator calculating parameters including a time length ratio of the voiced frame and the unvoiced frame determined by the pitch extracting unit, statistical information of the pitch contour, and spectral characteristics; and a classifier inputting the zero-cross rates and the parameters output from the parameter calculator and determining whether the input signal is a vocal sound.
摘要:
A multi-layered speech recognition apparatus and method, the apparatus includes a client checking whether the client recognizes the speech using a characteristic of speech to be recognized and recognizing the speech or transmitting the characteristic of the speech according to a checked result; and first through N-th servers, wherein the first server checks whether the first server recognizes the speech using the characteristic of the speech transmitted from the client, and recognizes the speech or transmits the characteristic according to a checked result, and wherein an n-th (2≦n≦N) server checks whether the n-th server recognizes the speech using the characteristic of the speech transmitted from an (n−1)-th server, and recognizes the speech or transmits the characteristic according to a checked result.
摘要:
An apparatus, method, and medium for detecting an impact sound and an apparatus, method, and medium for discriminating the impact sound using the same. The impact sound detecting apparatus includes: an onset detector separating an input signal of a frame unit into a low frequency signal and a high frequency signal, measuring powers of the separated signals, and detecting onsets by detecting changes in the measured powers; an event buffer buffering the powers measured by the onset detector and spectral data of the input signal; and an impact sound verifier determining whether each of the detected onsets is an impulse onset, and if each of the detected onsets is the impulse onset, detecting events starting from the impulse onsets by checking the powers stored in the event buffer and determining each of the detected events to be an impulse event if each of the detected onsets satisfies a predetermined condition.
摘要:
An apparatus, method, and medium for detecting an impact sound and an apparatus, method, and medium for discriminating the impact sound using the same. The impact sound detecting apparatus includes: an onset detector separating an input signal of a frame unit into a low frequency signal and a high frequency signal, measuring powers of the separated signals, and detecting onsets by detecting changes in the measured powers; an event buffer buffering the powers measured by the onset detector and spectral data of the input signal; and an impact sound verifier determining whether each of the detected onsets is an impulse onset, and if each of the detected onsets is the impulse onset, detecting events starting from the impulse onsets by checking the powers stored in the event buffer and determining each of the detected events to be an impulse event if each of the detected onsets satisfies a predetermined condition.
摘要:
A multi-layered speech recognition apparatus and method, the apparatus includes a client checking whether the client recognizes the speech using a characteristic of speech to be recognized and recognizing the speech or transmitting the characteristic of the speech according to a checked result; and first through N-th servers, wherein the first server checks whether the first server recognizes the speech using the characteristic of the speech transmitted from the client, and recognizes the speech or transmits the characteristic according to a checked result, and wherein an n-th (2≦n≦N) server checks whether the n-th server recognizes the speech using the characteristic of the speech transmitted from an (n−1)-th server, and recognizes the speech or transmits the characteristic according to a checked result.
摘要:
A formant tracking apparatus and a formant tracking method are provided. The formant tracking apparatus includes: a framing unit dividing an input voice signal into a plurality of frames; a linear prediction analyzing unit obtaining linear prediction coefficients for each frame; a segmentation unit segmenting each of the linear prediction coefficients into a plurality of segments; a formant candidate determining unit obtaining formant candidates by using the linear prediction coefficients, and summing the formant candidates for each segment to determine formant candidates for each segment; a formant number determining unit determining a number of tracking formants for each segment among the formant candidates satisfying a predetermined condition; and a tracking unit searching the tracking formants as many as the number of the tracking formants determined in the formant number determining unit among the formant candidates belonging to each segment.
摘要:
A multi-layered speech recognition apparatus and method, the apparatus includes a client checking whether the client recognizes the speech using a characteristic of speech to be recognized and recognizing the speech or transmitting the characteristic of the speech according to a checked result; and first through N-th servers, wherein the first server checks whether the first server recognizes the speech using the characteristic of the speech transmitted from the client, and recognizes the speech or transmits the characteristic according to a checked result, and wherein an n-th (2≦n≦N) server checks whether the n-th server recognizes the speech using the characteristic of the speech transmitted from an (n−1)-th server, and recognizes the speech or transmits the characteristic according to a checked result.