SYSTEM AND METHOD FOR DISTRIBUTED VOICE MODELS ACROSS CLOUD AND DEVICE FOR EMBEDDED TEXT-TO-SPEECH
    1.
    发明申请
    SYSTEM AND METHOD FOR DISTRIBUTED VOICE MODELS ACROSS CLOUD AND DEVICE FOR EMBEDDED TEXT-TO-SPEECH 有权
    用于分布式语音模型的系统和方法用于嵌入式文本到语音的云和设备

    公开(公告)号:US20160086598A1

    公开(公告)日:2016-03-24

    申请号:US14953771

    申请日:2015-11-30

    CPC classification number: G10L13/04 G10L13/047 G10L13/07

    Abstract: Systems, methods, and computer-readable storage media for intelligent caching of concatenative speech units for use in speech synthesis. A system configured to practice the method can identify a speech synthesis context, and determine, based on a local cache of text-to-speech units for a text-to-speech voice and based on the speech synthesis context, additional text-to-speech units which are not in the local cache. The system can request from a server the additional text-to-speech units, and store the additional text-to-speech units in the local cache. The system can then synthesize speech using the text-to-speech units and the additional text-to-speech units in the local cache. The system can prune the cache as the context changes, based on availability of local storage, or after synthesizing the speech. The local cache can store a core set of text-to-speech units associated with the text-to-speech voice that cannot be pruned from the local cache.

    Abstract translation: 用于智能缓存用于语音合成的级联语音单元的系统,方法和计算机可读存储介质。 配置为实施该方法的系统可以识别语音合成上下文,并且基于用于文本到语音语音的文本到语音单元的本地高速缓存并且基于语音合成上下文来确定附加的文本 - 不在本地缓存中的语音单元。 系统可以从服务器请求附加的文本到语音单元,并将附加的文本到语音单元存储在本地高速缓存中。 然后,系统可以使用本地高速缓存中的文本到语音单元和附加的文本到语音单元来合成语音。 系统可以根据本地存储的可用性,或合成语音之后随着上下文的变化修剪缓存。 本地缓存可以存储与文本到语音语音相关联的文本到语音单元的核心集合,其不能从本地高速缓存中修剪。

    SYSTEM AND METHOD FOR DATA-DRIVEN INTONATION GENERATION
    3.
    发明申请
    SYSTEM AND METHOD FOR DATA-DRIVEN INTONATION GENERATION 审中-公开
    用于数据驱动产生的系统和方法

    公开(公告)号:US20150149178A1

    公开(公告)日:2015-05-28

    申请号:US14087840

    申请日:2013-11-22

    CPC classification number: G10L13/10

    Abstract: Systems, methods, and computer-readable storage media for text-to-speech processing having an improved intonation. The system first receives text to be converted to speech, the text having a first segment and a second segment. The system then compares the text to a database of stored utterances, identifying in the database a first utterance corresponding to the first segment and determining an intonation of the first utterance. When the database does not contain a second utterance corresponding to the second segment, the system generates the speech corresponding to the text by combining the first utterance with a generated second utterance corresponding to the second segment, the generated second utterance having the intonation matching, or based on, the first utterance. These actions lead to an improved, smoother, more human-like synthetic speech output from the system.

    Abstract translation: 用于具有改进的语调的文本到语音处理的系统,方法和计算机可读存储介质。 系统首先接收要转换为语音的文本,该文本具有第一段和第二段。 然后,系统将文本与存储的话语的数据库进行比较,在数据库中标识对应于第一段的第一个发音,并确定第一个发音的语调。 当数据库不包含对应于第二段的第二话语时,系统通过将第一个发音与对应于第二个段的所生成的第二个发音组合,生成具有语调匹配的第二个话语,或者 基于第一个话语。 这些动作导致系统的改进,更平滑,更人性化的合成语音输出。

    SYSTEM AND METHOD FOR LOW-LATENCY WEB-BASED TEXT-TO-SPEECH WITHOUT PLUGINS

    公开(公告)号:US20160098985A1

    公开(公告)日:2016-04-07

    申请号:US14967740

    申请日:2015-12-14

    CPC classification number: G10L13/04 G10L13/10

    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for reducing latency in web-browsing TTS systems without the use of a plug-in or Flash® module. A system configured according to the disclosed methods allows the browser to send prosodically meaningful sections of text to a web server. A TTS server then converts intonational phrases of the text into audio and responds to the browser with the audio file. The system saves the audio file in a cache, with the file indexed by a unique identifier. As the system continues converting text into speech, when identical text appears the system uses the cached audio corresponding to the identical text without the need for re-synthesis via the TTS server.

    SYSTEM AND METHOD FOR AUTOMATIC DETECTION OF ABNORMAL STRESS PATTERNS IN UNIT SELECTION SYNTHESIS
    7.
    发明申请
    SYSTEM AND METHOD FOR AUTOMATIC DETECTION OF ABNORMAL STRESS PATTERNS IN UNIT SELECTION SYNTHESIS 有权
    用于自动检测单位选择合成中的异常应力模式的系统和方法

    公开(公告)号:US20150170637A1

    公开(公告)日:2015-06-18

    申请号:US14628790

    申请日:2015-02-23

    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for detecting and correcting abnormal stress patterns in unit-selection speech synthesis. A system practicing the method detects incorrect stress patterns in selected acoustic units representing speech to be synthesized, and corrects the incorrect stress patterns in the selected acoustic units to yield corrected stress patterns. The system can further synthesize speech based on the corrected stress patterns. In one aspect, the system also classifies the incorrect stress patterns using a machine learning algorithm such as a classification and regression tree, adaptive boosting, support vector machine, and maximum entropy. In this way a text-to-speech unit selection speech synthesizer can produce more natural sounding speech with suitable stress patterns regardless of the stress of units in a unit selection database.

    Abstract translation: 这里公开了用于在单位选择语音合成中检测和校正异常应力模式的系统,方法和非暂时的计算机可读存储介质。 实施该方法的系统检测表示要合成的语音的所选声学单元中的不正确应力模式,并且校正所选声学单元中的不正确应力模式以产生校正的应力模式。 该系统可以基于校正的应力模式进一步合成语音。 在一个方面,系统还使用诸如分类和回归树,自适应增强,支持向量机和最大熵的机器学习算法对不正确的应力模式进行分类。 以这种方式,文本到语音单元选择语音合成器可以产生具有合适的应力模式的更自然的声音语音,而不管单元选择数据库中的单元的应力。

    System and Method for Cloud-Based Text-to-Speech Web Services
    9.
    发明申请
    System and Method for Cloud-Based Text-to-Speech Web Services 有权
    基于云的文本到语音Web服务的系统和方法

    公开(公告)号:US20150221298A1

    公开(公告)日:2015-08-06

    申请号:US14684893

    申请日:2015-04-13

    CPC classification number: G10L13/04 G10L13/00 G10L13/043

    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating speech. One variation of the method is from a server side, and another variation of the method is from a client side. The server side method, as implemented by a network-based automatic speech processing system, includes first receiving, from a network client independent of knowledge of internal operations of the system, a request to generate a text-to-speech voice. The request can include speech samples, transcriptions of the speech samples, and metadata describing the speech samples. The system extracts sound units from the speech samples based on the transcriptions and generates an interactive demonstration of the text-to-speech voice based on the sound units, the transcriptions, and the metadata, wherein the interactive demonstration hides a back end processing implementation from the network client. The system provides access to the interactive demonstration to the network client.

    Abstract translation: 本文公开了用于产生语音的系统,方法和非暂时的计算机可读存储介质。 该方法的一个变体是来自服务器端,并且该方法的另一变体是来自客户端。 由基于网络的自动语音处理系统实现的服务器端方法包括首先从网络客户端接收与系统的内部操作相关的知识,生成文本到语音语音的请求。 该请求可以包括语音样本,语音样本的转录以及描述语音样本的元数据。 该系统基于转录从语音样本中提取声音单元,并基于声音单元,转录和元数据生成文本到语音语音的交互式演示,其中交互式演示隐藏了后端处理实现 网络客户端。 该系统提供对网络客户端的交互式演示的访问。

Patent Agency Ranking