GENERATION OF VOICE MESSAGES
    1.
    发明公开
    GENERATION OF VOICE MESSAGES 无效
    GENERATION声音新闻

    公开(公告)号:EP1000499A1

    公开(公告)日:2000-05-17

    申请号:EP98937627.2

    申请日:1998-07-31

    IPC分类号: H04M3/50 G10L5/04 G10L5/02

    摘要: A method of generating a message having an invariable portion (U1) and a variable portion (V) is provided. Most of the invariable portion (U1) is provided in the form of recorded speech (A) whereas the variable portion (V) is provided in the form of synthesised speech (B). The synthesised speech (8) also extends by half a phoneme into the invariable portion (U1) of the message. The synthesised speech (B) and the recorded speech (A) are then concatenated, with a transition signal being formed on the basis of a boundary portion of each of the recorded (A) and synthesised signals (B) about any join (8). In forming the transition signal, a set of transition signal pitchmarks is created and an overlap-add technique is used to copy the waveform within the boundary portions of the speech signals (A, B) around the transition signal pitchmarks. The signal around the penultimate pitchmark in the leading boundary portion is copied to the trailing half of the transition signal and the signal around the second pitchmark in the trailing boundary portion is copied to the leading half of the transition signal. In this way, the characteristics of the generated message around the join (8) change gradually between the characteristics of the recorded speech (A) and the characteristics of the synthesised speech (B).

    Method and apparatus using decision trees to generate and score multiple pronunciations for a spelled word
    2.
    发明公开
    Method and apparatus using decision trees to generate and score multiple pronunciations for a spelled word 有权
    装置和方法多Ausprachevarianten的生产和使用评价决策树一个拼写的单词

    公开(公告)号:EP0953970A3

    公开(公告)日:2000-01-19

    申请号:EP99303390.1

    申请日:1999-04-29

    IPC分类号: G10L5/04

    CPC分类号: G10L13/08

    摘要: The mixed decision tree includes a network of yes-no questions about adjacent letters in a spelled word sequence and also about adjacent phonemes in the phoneme sequence corresponding to the spelled word sequence. Leaf nodes of the mixed decision tree provide information about which phonetic transcriptions are most probable. Using the mixed trees, scores are developed for each of a plurality of possible pronunciations, and these scores can be used to select the best pronunciation as well as to rank pronunciations in order of probability. The pronunciations generated by the system can be used in speech synthesis and speech recognition applications as well as lexicography applications.

    Method and apparatus using decision trees to generate and score multiple pronunciations for a spelled word
    3.
    发明公开
    Method and apparatus using decision trees to generate and score multiple pronunciations for a spelled word 有权
    装置和方法多Ausprachevarianten的生产和使用评价决策树一个拼写的单词

    公开(公告)号:EP0953970A2

    公开(公告)日:1999-11-03

    申请号:EP99303390.1

    申请日:1999-04-29

    IPC分类号: G10L5/04

    CPC分类号: G10L13/08

    摘要: The mixed decision tree includes a network of yes-no questions about adjacent letters in a spelled word sequence and also about adjacent phonemes in the phoneme sequence corresponding to the spelled word sequence. Leaf nodes of the mixed decision tree provide information about which phonetic transcriptions are most probable. Using the mixed trees, scores are developed for each of a plurality of possible pronunciations, and these scores can be used to select the best pronunciation as well as to rank pronunciations in order of probability. The pronunciations generated by the system can be used in speech synthesis and speech recognition applications as well as lexicography applications.

    摘要翻译: 混合决策树包括的是,没有关于一个字拼写顺序相邻字母,因此关于音素序列中的相邻音素对应的拼写单词序列问题的网络。 混合决策树的叶节点提供有关注音哪些是最有可能的信息。 使用混合树,得分为每个可能发音多个开发和综合分数可以用来选择最优的发音以及在概率高的顺序排名的发音。 由系统产生的发音可以在语音合成和语音识别应用程序,以及词典学应用中。

    Script recognition using speech recognition
    5.
    发明公开
    Script recognition using speech recognition 审中-公开
    Drehbucherkennung durch Spracherkennung

    公开(公告)号:EP0899737A3

    公开(公告)日:1999-08-25

    申请号:EP98306540.0

    申请日:1998-08-17

    申请人: TEKTRONIX, INC.

    摘要: Script recognition using speech recognition for use in editing of video or film clips uses preferably a grammar based speech recognition engine. A script file and audio dialog file are input a speech recognition system, and the script file is processed to generate a grammar file, which in turn is reduced to a binary context file compatible with a specific speech recognition engine. The script file and audio file are used to define variable parameters for the speech recognition engine. The audio file is broken up into utterances which are processed by the speech recognition engine according to the variable parameters and the context file. The best "guess" from the speech recognition engine is fitted to the script file to determine a match. Mismatched utterances are fed back to the utterance determining step to determine a new search point. With a match the audio file is marked with the corresponding location in the script or the script file is time marked with the corresponding video clip time code. Video or film clips may then be accessed for editing by indicating a place in the script or the dialog.

    摘要翻译: 使用用于编辑视频或电影剪辑的语音识别的脚本识别优选地使用基于语法的语音识别引擎。 脚本文件和音频对话文件被输入语音识别系统,脚本文件被处理以生成语法文件,语法文件又被简化为与特定语音识别引擎兼容的二进制上下文文件。 脚本文件和音频文件用于定义语音识别引擎的可变参数。 音频文件被分解成由语音识别引擎根据可变参数和上下文文件处理的话语。 来自语音识别引擎的最佳“猜测”适用于脚本文件以确定匹配。 不匹配的话语被反馈到话语确定步骤以确定新的搜索点。 通过匹配,音频文件被标记在脚本中的相应位置,或脚本文件被时间标记为相应的视频剪辑时间码。 然后可以通过在脚本或对话框中指示一个位置来访问视频或电影剪辑以进行编辑。

    A sytem for interactive communication
    6.
    发明公开
    A sytem for interactive communication 失效
    系统互动交流

    公开(公告)号:EP0848373A3

    公开(公告)日:1999-03-10

    申请号:EP97310085.2

    申请日:1997-12-12

    IPC分类号: G10L5/04

    摘要: A system for providing a primarily audio environment for world wide web access includes a system for rendering structured documents using audio, an interface for information exchange to users, a non-keyword based WWW search system and a few miscellaneous features. The system for rendering structured documents using audio includes a pre-rendering system which converts a HTML document into an intermediate document and a rendering system which actually generates an audio output. The interface includes a non-visual browsing system and an interface to users for visual browsing environments.

    AUF MIKROSEGMENTEN BASIERENDES SPRACHSYNTHESEVERFAHREN
    7.
    发明公开
    AUF MIKROSEGMENTEN BASIERENDES SPRACHSYNTHESEVERFAHREN 失效
    微管片基于语音合成方法

    公开(公告)号:EP0886853A1

    公开(公告)日:1998-12-30

    申请号:EP97917259

    申请日:1997-03-08

    IPC分类号: G10L13/04 G10L13/07 G10L5/04

    CPC分类号: G10L13/07 G10L13/04

    摘要: The invention concerns a digital speech-synthesis process whereby utterances in a language are recorded, the recorded utterances are divided into speech segments which are stored so as to allow their allocation to specific phonemes; a text which is to be output as speech is converted to a phoneme chain and the stored segments are output in a sequence defined by the phoneme chain; an analysis of the text to be output as speech is carried out and thus provides information which completes the phoneme chain and modifies the timing sequence signal for the speech segments which are to be strung together for output as speech. The invention is characterised by the use of, as speech segments, microsegments consisting of: segments for vowel halves and semi-vowel halves, vowels standing between consonants being split into two microsegments, a first vowel half beginning shortly before the start of the vowel and extending as far as the vowel middle, and a second vowel half from the vowel middle to just before the vowel end; segments for quasi-stationary vowel components cut from the middle of a vowel; consonant segments beginning shortly before the front phoneme boundary and ending shortly before the rear phoneme boundary; and segments for vowel-vowel sequences cut from the middle of a vowel-vowel transition.

    Determinization and minimization for speech recognition
    8.
    发明公开
    Determinization and minimization for speech recognition 失效
    语音识别的确定和最小化

    公开(公告)号:EP0854468A3

    公开(公告)日:1998-12-30

    申请号:EP98300140.5

    申请日:1998-01-09

    申请人: AT&T Corp.

    CPC分类号: G10L15/193

    摘要: A pattern recognition system and method for optimal reduction of redundancy and size of a weighted and labeled graph presents receiving speech signals, converting the speech signals into word sequences, interpreting the word sequences in a graph where the graph is labeled with word sequences and weighted with probabilities and determinizing the graph by removing redundant word sequences. The size of the graph can also be minimized by collapsing some nodes of the graph in a reverse determinizing manner. The graph can further be tested for determinizability to determine if the graph can be determinized. The resulting word sequence in the graph may be shown in a display device so that recognition of speech signals can be demonstrated.

    摘要翻译: 用于最佳减少加权和标记图的冗余和大小的模式识别系统和方法呈现接收语音信号,将语音信号转换成单词序列,在图中解释单词序列,其中图用单词序列标记并且用 概率并通过去除多余的单词序列来确定图形。 图形的大小也可以通过以反向确定方式折叠图形的一些节点来最小化。 可以进一步测试该图的可确定性以确定该图是否可以确定。 图中得到的单词序列可以显示在显示设备中,以便可以演示语音信号的识别。

    System and method for enhanced intelligibility of voice messages
    9.
    发明公开
    System and method for enhanced intelligibility of voice messages 失效
    的设备和方法的语音消息的增强的清晰度

    公开(公告)号:EP0851404A3

    公开(公告)日:1998-12-30

    申请号:EP97121959.7

    申请日:1997-12-12

    申请人: AT&T Corp.

    摘要: A system and method is provided for playing back a recorded voice message, and, in particular, for automatically playing back a spoken numeric portion of the message at a rate that is slower than the rate for playing back the remaining portion of the recorded voice message. A voice messaging system receives and analyzes the voice message. Specifically, the messaging system determines whether the voice message includes spoken numeric information and, if so, determines the relative position of the spoken numeric information within the message. The computer system stores both the voice message and the positional information in a storage device. Upon playback of the message, the messaging system retrieves the stored voice message and positional information from the storage device. As the voice message is played back, the messaging system processes the positional information. When the positional information indicates that a particular portion of a voice message includes spoken numeric information, that particular portion is played back at a decreased speed.