SYSTEM AND METHOD FOR DISTRIBUTED VOICE MODELS ACROSS CLOUD AND DEVICE FOR EMBEDDED TEXT-TO-SPEECH
    1.
    发明申请
    SYSTEM AND METHOD FOR DISTRIBUTED VOICE MODELS ACROSS CLOUD AND DEVICE FOR EMBEDDED TEXT-TO-SPEECH 有权
    用于分布式语音模型的系统和方法用于嵌入式文本到语音的云和设备

    公开(公告)号:US20160086598A1

    公开(公告)日:2016-03-24

    申请号:US14953771

    申请日:2015-11-30

    IPC分类号: G10L13/04 G10L13/07

    摘要: Systems, methods, and computer-readable storage media for intelligent caching of concatenative speech units for use in speech synthesis. A system configured to practice the method can identify a speech synthesis context, and determine, based on a local cache of text-to-speech units for a text-to-speech voice and based on the speech synthesis context, additional text-to-speech units which are not in the local cache. The system can request from a server the additional text-to-speech units, and store the additional text-to-speech units in the local cache. The system can then synthesize speech using the text-to-speech units and the additional text-to-speech units in the local cache. The system can prune the cache as the context changes, based on availability of local storage, or after synthesizing the speech. The local cache can store a core set of text-to-speech units associated with the text-to-speech voice that cannot be pruned from the local cache.

    摘要翻译: 用于智能缓存用于语音合成的级联语音单元的系统,方法和计算机可读存储介质。 配置为实施该方法的系统可以识别语音合成上下文,并且基于用于文本到语音语音的文本到语音单元的本地高速缓存并且基于语音合成上下文来确定附加的文本 - 不在本地缓存中的语音单元。 系统可以从服务器请求附加的文本到语音单元,并将附加的文本到语音单元存储在本地高速缓存中。 然后,系统可以使用本地高速缓存中的文本到语音单元和附加的文本到语音单元来合成语音。 系统可以根据本地存储的可用性,或合成语音之后随着上下文的变化修剪缓存。 本地缓存可以存储与文本到语音语音相关联的文本到语音单元的核心集合,其不能从本地高速缓存中修剪。

    Systems, Computer-Implemented Methods, and Tangible Computer-Readable Storage Media For Transcription Alighnment
    2.
    发明申请
    Systems, Computer-Implemented Methods, and Tangible Computer-Readable Storage Media For Transcription Alighnment 有权
    系统,计算机实现的方法和有形计算机可读存储介质的转录缩写

    公开(公告)号:US20150046160A1

    公开(公告)日:2015-02-12

    申请号:US14492616

    申请日:2014-09-22

    IPC分类号: G10L15/26 G10L21/06

    摘要: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for captioning a media presentation. The method includes receiving automatic speech recognition (ASR) output from a media presentation and a transcription of the media presentation. The method includes selecting via a processor a pair of anchor words in the media presentation based on the ASR output and transcription and generating captions by aligning the transcription with the ASR output between the selected pair of anchor words. The transcription can be human-generated. Selecting pairs of anchor words can be based on a similarity threshold between the ASR output and the transcription. In one variation, commonly used words on a stop list are ineligible as anchor words. The method includes outputting the media presentation with the generated captions. The presentation can be a recording of a live event.

    摘要翻译: 本文公开了系统,计算机实现的方法和用于标题媒体呈现的有形的计算机可读存储介质。 该方法包括从媒体呈现和媒体呈现的转录接收自动语音识别(ASR)输出。 该方法包括:通过处理器选择基于ASR输出和转录的媒体呈现中的一对锚定词,并通过将转录与所选择的一对锚点之间的ASR输出对齐来产生标题。 转录可以是人类产生的。 选择锚点对可以基于ASR输出和转录之间的相似性阈值。 在一个变体中,停止列表上常用的单词不符合锚点词。 该方法包括用生成的标题输出媒体呈现。 演示文稿可以是现场直播的录音。

    REAL - TIME EMOTION TRACKING SYSTEM
    4.
    发明申请
    REAL - TIME EMOTION TRACKING SYSTEM 有权
    实时感应跟踪系统

    公开(公告)号:US20140163960A1

    公开(公告)日:2014-06-12

    申请号:US13712288

    申请日:2012-12-12

    IPC分类号: G06F17/28

    摘要: Devices, systems, methods, media, and programs for detecting an emotional state change in an audio signal are provided. A plurality of segments of the audio signal is received, with the plurality of segments being sequential. Each segment of the plurality of segments is analyzed, and, for each segment, an emotional state and a confidence score of the emotional state are determined. The emotional state and the confidence score of each segment are sequentially analyzed, and a current emotional state of the audio signal is tracked throughout each of the plurality of segments. For each segment, it is determined whether the current emotional state of the audio signal changes to another emotional state based on the emotional state and the confidence score of the segment.

    摘要翻译: 提供了用于检测音频信号中的情绪状态改变的设备,系统,方法,媒体和程序。 接收音频信号的多个段,其中多个段是顺序的。 分析多个片段中的每个片段,并且针对每个片段,确定情感状态的情绪状态和置信评分。 顺序地分析每个片段的情绪状态和置信度得分,并且在多个片段中的每一个片段跟踪音频信号的当前情绪状态。 对于每个片段,基于片段的情绪状态和置信度分数确定音频信号的当前情绪状态是否改变到另一情感状态。

    SYSTEMS, COMPUTER-IMPLEMENTED METHODS, AND TANGIBLE COMPUTER-READABLE STORAGE MEDIA FOR TRANSCRIPTION ALIGNMENT
    7.
    发明申请
    SYSTEMS, COMPUTER-IMPLEMENTED METHODS, AND TANGIBLE COMPUTER-READABLE STORAGE MEDIA FOR TRANSCRIPTION ALIGNMENT 有权
    系统,计算机实现方法和可变数据可读存储介质用于转码对齐

    公开(公告)号:US20160198234A1

    公开(公告)日:2016-07-07

    申请号:US15071644

    申请日:2016-03-16

    摘要: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for captioning a media presentation. The method includes receiving automatic speech recognition (ASR) output from a media presentation and a transcription of the media presentation. The method includes selecting via a processor a pair of anchor words in the media presentation based on the ASR output and transcription and generating captions by aligning the transcription with the ASR output between the selected pair of anchor words. The transcription can be human-generated. Selecting pairs of anchor words can be based on a similarity threshold between the ASR output and the transcription. In one variation, commonly used words on a stop list are ineligible as anchor words. The method includes outputting the media presentation with the generated captions. The presentation can be a recording of a live event.

    摘要翻译: 本文公开了系统,计算机实现的方法和用于标题媒体呈现的有形的计算机可读存储介质。 该方法包括从媒体呈现和媒体呈现的转录接收自动语音识别(ASR)输出。 该方法包括:通过处理器选择基于ASR输出和转录的媒体呈现中的一对锚定词,并通过将转录与所选择的一对锚点之间的ASR输出对齐来产生标题。 转录可以是人类产生的。 选择锚点对可以基于ASR输出和转录之间的相似性阈值。 在一个变体中,停止列表上常用的单词不符合锚点词。 该方法包括用生成的标题输出媒体呈现。 演示文稿可以是现场直播的录音。

    System and Method for Synthetically Generated Speech Describing Media Content
    8.
    发明申请
    System and Method for Synthetically Generated Speech Describing Media Content 有权
    用于合成语音描述媒体内容的系统和方法

    公开(公告)号:US20140379350A1

    公开(公告)日:2014-12-25

    申请号:US14481326

    申请日:2014-09-09

    摘要: Disclosed herein are systems, methods, and computer readable-media for providing an automatic synthetically generated voice describing media content, the method comprising receiving one or more pieces of metadata for a primary media content, selecting at least one piece of metadata for output, and outputting the at least one piece of metadata as synthetically generated speech with the primary media content. Other aspects of the invention involve alternative output, output speech simultaneously with the primary media content, output speech during gaps in the primary media content, translate metadata in foreign language, tailor voice, accent, and language to match the metadata and/or primary media content. A user may control output via a user interface or output may be customized based on preferences in a user profile.

    摘要翻译: 本文公开了用于提供描述媒体内容的自动综合产生的语音的系统,方法和计算机可读介质,所述方法包括:接收用于主要媒体内容的一个或多个元数据,选择用于输出的至少一个元数据,以及 将所述至少一个元数据作为具有主要媒体内容的合成生成的语音输出。 本发明的其他方面涉及与主要媒体内容同时的替代输出,输出语音,在主要媒体内容的间隙期间输出语音,以外语翻译元数据,定制语音,重音和语言以匹配元数据和/或主要媒体 内容。 用户可以经由用户界面来控制输出,或者可以基于用户简档中的偏好来定制输出。

    SYSTEMS, COMPUTER-IMPLEMENTED METHODS, AND TANGIBLE COMPUTER-READABLE STORAGE MEDIA FOR TRANSCRIPTION ALIGNMENT
    10.
    发明申请
    SYSTEMS, COMPUTER-IMPLEMENTED METHODS, AND TANGIBLE COMPUTER-READABLE STORAGE MEDIA FOR TRANSCRIPTION ALIGNMENT 有权
    系统,计算机实现方法和可变数据可读存储介质用于转码对齐

    公开(公告)号:US20170061986A1

    公开(公告)日:2017-03-02

    申请号:US15350339

    申请日:2016-11-14

    摘要: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for captioning a media presentation. The method includes receiving automatic speech recognition (ASR) output from a media presentation and a transcription of the media presentation. The method includes selecting via a processor a pair of anchor words in the media presentation based on the ASR output and transcription and generating captions by aligning the transcription with the ASR output between the selected pair of anchor words. The transcription can be human-generated. Selecting pairs of anchor words can be based on a similarity threshold between the ASR output and the transcription. In one variation, commonly used words on a stop list are ineligible as anchor words. The method includes outputting the media presentation with the generated captions. The presentation can be a recording of a live event.

    摘要翻译: 本文公开了系统,计算机实现的方法和用于标题媒体呈现的有形的计算机可读存储介质。 该方法包括从媒体呈现和媒体呈现的转录接收自动语音识别(ASR)输出。 该方法包括:通过处理器选择基于ASR输出和转录的媒体呈现中的一对锚定词,并通过将转录与所选择的一对锚点之间的ASR输出对齐来产生标题。 转录可以是人类产生的。 选择锚点对可以基于ASR输出和转录之间的相似性阈值。 在一个变体中,停止列表上常用的单词不符合锚点词。 该方法包括用生成的标题输出媒体呈现。 演示文稿可以是现场直播的录音。