Fast vocabulary independent method and apparatus for spotting words in
speech
    11.
    发明授权
    Fast vocabulary independent method and apparatus for spotting words in speech 失效
    快速词汇独立的方法和设备,用于在言语中发现单词

    公开(公告)号:US6073095A

    公开(公告)日:2000-06-06

    申请号:US950621

    申请日:1997-10-15

    摘要: A fast vocabulary independent method for spotting words in speech utilizes a preprocessing step and a coarse-to-detailed search strategy for spotting a word/phone sequence in speech. The preprocessing includes a Viterbi-beam phone level decoding using a tree-based phone language model. The coarse search matches phone-ngrams to identify regions of speech as putative word hits, and the detailed search performs an acoustic match at the putative hits with a model of the given word included in the vocabulary of the recognizer.

    摘要翻译: 用于在语音中发现单词的快速词汇独立方法利用预处理步骤和用于在语音中发现单词/电话序列的粗略到详细的搜索策略。 预处理包括使用基于树的手机语言模型的维特比波束电话级解码。 粗略搜索匹配电话号码以将语音区域识别为假定词命中,并且详细搜索在推定命中与在识别器的词汇表中包括的给定单词的模型进行声匹配。

    Generating a frequency warping function based on phoneme and context
    12.
    发明授权
    Generating a frequency warping function based on phoneme and context 有权
    基于音素和语境生成频率扭曲函数

    公开(公告)号:US08401861B2

    公开(公告)日:2013-03-19

    申请号:US11654447

    申请日:2007-01-17

    IPC分类号: G10L21/00 G10L13/06

    CPC分类号: G10L15/07 G10L2021/0135

    摘要: A method for generating a frequency warping function comprising preparing the training speech of a source and a target speaker; performing frame alignment on the training speech of the speakers; selecting aligned frames from the frame-aligned training speech of the speakers; extracting corresponding sets of formant parameters from the selected aligned frames; and generating a frequency warping function based on the corresponding sets of formant parameters. The step of selecting aligned frames preferably selects a pair of aligned frames in the middle of the same or similar frame-aligned phonemes with the same or similar contexts in the speech of the source speaker and target speaker. The step of generating a frequency warping function preferably uses the various pairs of corresponding formant parameters in the corresponding sets of formant parameters as key positions in a piecewise linear frequency warping function to generate the frequency warping function.

    摘要翻译: 一种用于产生频率扭曲函数的方法,包括准备源和目标说话者的训练语音; 对演讲者的训练语音进行框架对齐; 从扬声器的帧对齐训练语音中选择对准的帧; 从所选择的对齐的帧中提取相应的共振峰参数集合; 以及基于相应的共振峰参数集合生成频率扭曲函数。 选择对准的帧的步骤优选地在源扬声器和目标扬声器的语音中使用相同或相似的上下文在相同或相似的帧对准音素的中间选择一对对齐的帧。 产生频率扭曲函数的步骤优选地使用相应的共振峰参数集合中的各种相应的共振峰参数作为分段线性频率扭曲函数中的关键位置来产生频率扭曲函数。

    METHOD AND SYSTEM FOR PROMPT CONSTRUCTION FOR SELECTION FROM A LIST OF ACOUSTICALLY CONFUSABLE ITEMS IN SPOKEN DIALOG SYSTEMS
    13.
    发明申请
    METHOD AND SYSTEM FOR PROMPT CONSTRUCTION FOR SELECTION FROM A LIST OF ACOUSTICALLY CONFUSABLE ITEMS IN SPOKEN DIALOG SYSTEMS 有权
    用于从SPOKEN对话系统中的声音可混合项目列表中选择的提供构建的方法和系统

    公开(公告)号:US20080281598A1

    公开(公告)日:2008-11-13

    申请号:US11746087

    申请日:2007-05-09

    IPC分类号: G10L11/00

    CPC分类号: G10L15/22 G10L15/187

    摘要: A method (and system) of determining confusable list items and resolving this confusion in a spoken dialog system includes receiving user input, processing the user input and determining if a list of items needs to be played back to the user, retrieving the list to be played back to the user, identifying acoustic confusions between items on the list, changing the items on the list as necessary to remove the acoustic confusions, and playing unambiguous list items back to the user.

    摘要翻译: 一种确定可混淆列表项目并在口头对话系统中解决这种混淆的方法(和系统)包括接收用户输入,处理用户输入并确定是否需要向用户回放项目列表,将列表检索为 播放给用户,识别列表上的项目之间的声音混淆,根据需要更改列表上的项目以消除声音混淆,并将明确的列表项目播放回用户。

    Method and apparatus for producing natural sounding pitch contours in a speech synthesizer
    14.
    发明授权
    Method and apparatus for producing natural sounding pitch contours in a speech synthesizer 有权
    用于在语音合成器中产生自然声音俯仰轮廓的方法和装置

    公开(公告)号:US07280969B2

    公开(公告)日:2007-10-09

    申请号:US09732122

    申请日:2000-12-07

    IPC分类号: G10L13/06

    CPC分类号: G10L13/033 G10L13/0335

    摘要: A speech synthesis system is disclosed that utilizes a pitch contour resulting in a more natural-sounding speech. The present invention modifies the predicted pitch, b(t), for synthesized speech using a low frequency energy booster. The low frequency energy booster interpolates the discrete pitch values, if necessary, and increase the amount of energy of the pitch contour associated with low frequency values, such as all frequency values below 10 Hertz. The amount of energy of the pitch contour associated with low frequency values can be increased, for example, by adding band-limited noise (a carrier signal) to the pitch contour, b(t), or by filtering the pitch values with an impulse response filter having a pole at the desired low frequency value. The present invention serves to add vibrato to the to the original pitch contour, b(t), and thereby improves the naturalness of the synthetic waveform.

    摘要翻译: 公开了一种语音合成系统,其利用音调轮廓导致更自然的语音。 本发明使用低频能量增强器来修改用于合成语音的预测音调b(t)。 如果需要,低频能量增强器内插离散音调值,并增加与低频值相关联的音高轮廓的能量的量,例如低于10赫兹的所有频率值。 与低频值相关联的音高轮廓的能量的量可以增加,例如通过将频带限制噪声(载波信号)添加到音调轮廓b(t),或者通过用脉冲对频率值进行滤波 响应滤波器具有所需低频值的极点。 本发明用于将颤音添加到原始音调轮廓b(t),从而提高合成波形的自然度。

    Speech and signal digitization by using recognition metrics to select from multiple techniques
    15.
    发明授权
    Speech and signal digitization by using recognition metrics to select from multiple techniques 有权
    通过使用识别度量来选择多种技术的语音和信号数字化

    公开(公告)号:US07016835B2

    公开(公告)日:2006-03-21

    申请号:US10323549

    申请日:2002-12-19

    IPC分类号: G10L15/00

    CPC分类号: G10L15/32 G10L17/26

    摘要: A characteristic-specific digitization method and apparatus are disclosed that reduces the error rate in converting input information into a computer-readable format. The input information is analyzed and subsets of the input information are classified according to whether the input information exhibits a specific physical parameter affecting recognition accuracy. If the input information exhibits the specific physical parameter affecting recognition accuracy, the characteristic-specific digitization system recognizes the input information using a characteristic-specific recognizer that demonstrates improved performance for the given physical parameter. If the input information does not exhibit the specific physical parameter affecting recognition accuracy, the characteristic-specific digitization system recognizes the input information using a general recognizer that performs well for typical input information. In one implementation, input speech having very low recognition accuracy as a result of a physical speech characteristic is automatically identified and recognized using a characteristic-specific speech recognizer.

    摘要翻译: 公开了特征数字化方法和装置,其减少将输入信息转换为计算机可读格式的错误率。 分析输入信息,并根据输入信息是否表现出影响识别精度的特定物理参数对输入信息的子集进行分类。 如果输入信息表现出影响识别精度的特定物理参数,则特征特定数字化系统使用特征识别器识别输入信息,该识别器演示了给定物理参数的改进性能。 如果输入信息不具有影响识别精度的特定物理参数,则特征数字化系统使用对典型输入信息执行良好的一般识别器来识别输入信息。 在一个实现中,作为物理语音特征的结果具有非常低的识别精度的输入语音被使用特征语音识别器自动识别和识别。