-
公开(公告)号:US10249290B2
公开(公告)日:2019-04-02
申请号:US16004812
申请日:2018-06-11
摘要: Systems, methods, and computer-readable storage devices to improve the quality of synthetic speech generation. A system selects speech units from a speech unit database, the speech units corresponding to text to be converted to speech. The system identifies a desired prosodic curve of speech produced from the selected speech units, and also identifies an actual prosodic curve of the speech units. The selected speech units are modified such that a new prosodic curve of the modified speech units matches the desired prosodic curve. The system stores the modified speech units into the speech unit database for use in generating future speech, thereby increasing the prosodic coverage of the database with the expectation of improving the output quality.
-
公开(公告)号:US11049491B2
公开(公告)日:2021-06-29
申请号:US16828070
申请日:2020-03-24
摘要: Systems, methods, and computer-readable storage devices to improve the quality of synthetic speech generation. A system selects speech units from a speech unit database, the speech units corresponding to text to be converted to speech. The system identifies a desired prosodic curve of speech produced from the selected speech units, and also identifies an actual prosodic curve of the speech units. The selected speech units are modified such that a new prosodic curve of the modified speech units matches the desired prosodic curve. The system stores the modified speech units into the speech unit database for use in generating future speech, thereby increasing the prosodic coverage of the database with the expectation of improving the output quality.
-
公开(公告)号:US20190228761A1
公开(公告)日:2019-07-25
申请号:US16369882
申请日:2019-03-29
摘要: Systems, methods, and computer-readable storage devices to improve the quality of synthetic speech generation. A system selects speech units from a speech unit database, the speech units corresponding to text to be converted to speech. The system identifies a desired prosodic curve of speech produced from the selected speech units, and also identifies an actual prosodic curve of the speech units. The selected speech units are modified such that a new prosodic curve of the modified speech units matches the desired prosodic curve. The system stores the modified speech units into the speech unit database for use in generating future speech, thereby increasing the prosodic coverage of the database with the expectation of improving the output quality.
-
公开(公告)号:US20210272549A1
公开(公告)日:2021-09-02
申请号:US17306887
申请日:2021-05-03
发明人: Ladan Golipour , Alistair D. Conkie
IPC分类号: G10L13/10
摘要: A system, method and computer-readable storage devices are for normalizing text for ASR and TTS in a language-neutral way. The system described herein divides Unicode text into meaningful chunks called “atomic tokens.” The atomic tokens strongly correlate to their actual pronunciation, and not to their meaning. The system combines the tokenization with a data-driven classification scheme, followed by class-determined actions to convert text to normalized form. The classification labels are based on pronunciation, unlike alternative approaches that typically employ Named Entity-based categories. Thus, this approach is relatively simple to adapt to new languages. Non-experts can easily annotate training data because the tokens are based on pronunciation alone.
-
公开(公告)号:US10997964B2
公开(公告)日:2021-05-04
申请号:US16542514
申请日:2019-08-16
发明人: Ladan Golipour , Alistair D. Conkie
摘要: A system, method and computer-readable storage devices are for normalizing text for ASR and TTS in a language-neutral way. The system described herein divides Unicode text into meaningful chunks called “atomic tokens.” The atomic tokens strongly correlate to their actual pronunciation, and not to their meaning. The system combines the tokenization with a data-driven classification scheme, followed by class-determined actions to convert text to normalized form. The classification labels are based on pronunciation, unlike alternative approaches that typically employ Named Entity-based categories. Thus, this approach is relatively simple to adapt to new languages. Non-experts can easily annotate training data because the tokens are based on pronunciation alone.
-
公开(公告)号:US10607594B2
公开(公告)日:2020-03-31
申请号:US16369882
申请日:2019-03-29
IPC分类号: G10L13/06 , G10L19/12 , G10L25/90 , G10L19/04 , G10L19/18 , G11C7/16 , G10L15/06 , G10L15/02 , G10L21/003
摘要: Systems, methods, and computer-readable storage devices to improve the quality of synthetic speech generation. A system selects speech units from a speech unit database, the speech units corresponding to text to be converted to speech. The system identifies a desired prosodic curve of speech produced from the selected speech units, and also identifies an actual prosodic curve of the speech units. The selected speech units are modified such that a new prosodic curve of the modified speech units matches the desired prosodic curve. The system stores the modified speech units into the speech unit database for use in generating future speech, thereby increasing the prosodic coverage of the database with the expectation of improving the output quality.
-
7.
公开(公告)号:US10199034B2
公开(公告)日:2019-02-05
申请号:US14461930
申请日:2014-08-18
发明人: Alistair D. Conkie , Ladan Golipour
摘要: A system, method and computer-readable storage devices are for using a single set of normalization protocols and a single language lexica (or dictionary) for both TTS and ASR. The system receives input (which is either text to be converted to speech or ASR training text), then normalizes the input. The system produces, using the normalized input and a dictionary configured for both automatic speech recognition and text-to-speech processing, output which is either phonemes corresponding to the input or text corresponding to the input for training the ASR system. When the output is phonemes corresponding to the input, the system generates speech by performing prosody generation and unit selection synthesis using the phonemes. When the output is text corresponding to the input, the system trains both an acoustic model and a language model for use in future speech recognition.
-
公开(公告)号:US20200227023A1
公开(公告)日:2020-07-16
申请号:US16828070
申请日:2020-03-24
摘要: Systems, methods, and computer-readable storage devices to improve the quality of synthetic speech generation. A system selects speech units from a speech unit database, the speech units corresponding to text to be converted to speech. The system identifies a desired prosodic curve of speech produced from the selected speech units, and also identifies an actual prosodic curve of the speech units. The selected speech units are modified such that a new prosodic curve of the modified speech units matches the desired prosodic curve. The system stores the modified speech units into the speech unit database for use in generating future speech, thereby increasing the prosodic coverage of the database with the expectation of improving the output quality.
-
公开(公告)号:US10388270B2
公开(公告)日:2019-08-20
申请号:US14533589
申请日:2014-11-05
发明人: Ladan Golipour , Alistair D. Conkie
IPC分类号: G10L13/10
摘要: A system, method and computer-readable storage devices are for normalizing text for ASR and TTS in a language-neutral way. The system described herein divides Unicode text into meaningful chunks called “atomic tokens.” The atomic tokens strongly correlate to their actual pronunciation, and not to their meaning The system combines the tokenization with a data-driven classification scheme, followed by class-determined actions to convert text to normalized form. The classification labels are based on pronunciation, unlike alternative approaches that typically employ Named Entity-based categories. Thus, this approach is relatively simple to adapt to new languages. Non-experts can easily annotate training data because the tokens are based on pronunciation alone.
-
公开(公告)号:US09997154B2
公开(公告)日:2018-06-12
申请号:US14275349
申请日:2014-05-12
CPC分类号: G10L13/06 , G10L15/02 , G10L15/063 , G10L21/003 , G10L25/90 , G10L2015/0635 , G11C7/16
摘要: Systems, methods, and computer-readable storage devices to improve the quality of synthetic speech generation. A system selects speech units from a speech unit database, the speech units corresponding to text to be converted to speech. The system identifies a desired prosodic curve of speech produced from the selected speech units, and also identifies an actual prosodic curve of the speech units. The selected speech units are modified such that a new prosodic curve of the modified speech units matches the desired prosodic curve. The system stores the modified speech units into the speech unit database for use in generating future speech, thereby increasing the prosodic coverage of the database with the expectation of improving the output quality.
-
-
-
-
-
-
-
-
-