摘要:
A speech speed conversion factor determining device has a physical index calculation unit including a sound/silence judgment unit that distinguishes between sound and silent intervals of an input signal, a fundamental frequency calculation unit that calculates a fundamental frequency of the signal in the sound intervals and determines stable and unstable intervals, a frequency smoothing unit that smoothes the fundamental frequency in the stable intervals, a pseudo fundamental frequency calculation unit that calculates, for the intervals, a pseudo fundamental frequency by interpolation , and a fundamental frequency general shape connection unit that connects the smoothed and pseudo frequencies to obtain sampled values of a general shape of the frequency, such that the sampled values are output as an index, based on which conversion factor are calculated.
摘要:
A reproduction part reproduced at a changeable speed ratio r. An A/D conversion part A/D converts, based on sampling frequency fi, an audio signal reproduced at a speed different from that upon recording. A block data division part divides audio data based on an attribute possessed by the audio data. An audio data connection part successively interpolates or thins out the divided audio data based on a ratio of 1/r. A D/A conversion part D/A converts the interpolated or thinned-out audio data based on sampling frequency fo. If a relation of fi/fo=r/c is satisfied, the audio signal is outputted as a sound of high quality constantly synchronized with an image signal and having a pitch which does not change irrespective of the changeable speed ratio r at which the image signal is reproduced.
摘要翻译:以可变速比r再现的再现部分。 A / D转换部分A / D基于采样频率fi转换以与记录时不同的速度再现的音频信号。 块数据分割部根据音频数据拥有的属性来分割音频数据。 音频数据连接部分基于1 / r的比率连续地内插或分出分割的音频数据。 D / A转换部分D / A基于采样频率fo转换内插或稀疏音频数据。 如果满足fi / fo = r / c的关系,则音频信号被输出为与图像信号不断同步的高质量的声音,并且具有不改变的音调,而不管图像的可变速比r如何 信号被复制。
摘要:
A speech speed conversion factor determining device has a physical index calculation unit including a sound/silence judgment unit that distinguishes between sound and silent intervals of an input signal, a fundamental frequency calculation unit that calculates a fundamental frequency of the signal in the sound intervals and determines stable and unstable intervals, a frequency smoothing unit that smoothes the fundamental frequency in the stable intervals, a pseudo fundamental frequency calculation unit that calculates, for the intervals, a pseudo fundamental frequency by interpolation , and a fundamental frequency general shape connection unit that connects the smoothed and pseudo frequencies to obtain sampled values of a general shape of the frequency, such that the sampled values are output as an index, based on which conversion factor are calculated.
摘要:
Frame power of an input signal is calculated to discriminate speech frame intervals from non-speech intervals, by thresholding current frame power using an adaptive speech-detection threshold based on the past maximum frame power value and the difference between past maximum and the minimum frame power values, adaptively updated using a predetermined number of frames prior to the current one.
摘要:
A speech-rate converter slowing down input speech regularly monitors the data length of the input speech and the previously estimated extended output data length for the current rate scaling factor, computing new output data length estimates. The conversion rate is adaptively modified depending on the time lag between input and output speech so as to make input and output data lengths consistent without skipping any spoken input portions. Input signal power is monitored to discriminate speech and non-speech intervals, and the portions of input non-speech intervals exceeding a conversion-rate-dependent duration are deleted.
摘要:
An analysis processor applies an analysis process to input speech data thereby to obtain block lengths for respective attributes of voiced sound, voiceless sound and silence. A block data splitter splits the input speech data into blocks having the block lengths dependent on the respective attributes. A block data memory sequentially stores speech data split by the block data splitter as block speech data and the block lengths. A connection data generator generates connection data for connecting the adjacent block speech data each other at every moment by using the block speech data. A connection data storing portion sequentially stores the connection data. A connection order generator generates block connection order of the block speech data and the connection data at every moment according to at least the block lengths output sequentially from the block data storing portion and extension scaling factors in time for the respective attributes. A speech data connector connects sequentially the block speech data and the connection data based on the block connection order. Accordingly, the speed of output speech can be instantly changed in response to an instruction of an operator.
摘要:
A method and an apparatus for hearing assistance, capable of compensating the lowering of the speech recognition ability related to the deterioration of the auditory sense center. The input speech is divided into voiced speech sections, unvoiced speech sections, and silent sections, of which the voiced speech sections and the silent sections are appropriately extended/contracted while the unvoiced speech sections are left unchanged, and then these sections are combined in an identical order as in the input speech, so as to obtain output speech which is easier to listen for a listener with a handicapped hearing ability. Also, only the silent sections other than the punctuation silent sections for pauses due to punctuation between sentences can be contracted and the speech speed for each of the voiced speech sections can be adjusted, and then the adjusted voiced speech sections, the unvoiced speech sections, the punctuation silent sections and the contracted silent sections can be combined in an identical order as in the input speech, in order to realize the real time hearing assistance without extending the speech utterance period.