System and method for prosodically modified unit selection databases

    公开(公告)号:US10249290B2

    公开(公告)日:2019-04-02

    申请号:US16004812

    申请日:2018-06-11

    摘要: Systems, methods, and computer-readable storage devices to improve the quality of synthetic speech generation. A system selects speech units from a speech unit database, the speech units corresponding to text to be converted to speech. The system identifies a desired prosodic curve of speech produced from the selected speech units, and also identifies an actual prosodic curve of the speech units. The selected speech units are modified such that a new prosodic curve of the modified speech units matches the desired prosodic curve. The system stores the modified speech units into the speech unit database for use in generating future speech, thereby increasing the prosodic coverage of the database with the expectation of improving the output quality.

    System and method for prosodically modified unit selection databases

    公开(公告)号:US11049491B2

    公开(公告)日:2021-06-29

    申请号:US16828070

    申请日:2020-03-24

    摘要: Systems, methods, and computer-readable storage devices to improve the quality of synthetic speech generation. A system selects speech units from a speech unit database, the speech units corresponding to text to be converted to speech. The system identifies a desired prosodic curve of speech produced from the selected speech units, and also identifies an actual prosodic curve of the speech units. The selected speech units are modified such that a new prosodic curve of the modified speech units matches the desired prosodic curve. The system stores the modified speech units into the speech unit database for use in generating future speech, thereby increasing the prosodic coverage of the database with the expectation of improving the output quality.

    SYSTEM AND METHOD FOR PROSODICALLY MODIFIED UNIT SELECTION DATABASES

    公开(公告)号:US20190228761A1

    公开(公告)日:2019-07-25

    申请号:US16369882

    申请日:2019-03-29

    摘要: Systems, methods, and computer-readable storage devices to improve the quality of synthetic speech generation. A system selects speech units from a speech unit database, the speech units corresponding to text to be converted to speech. The system identifies a desired prosodic curve of speech produced from the selected speech units, and also identifies an actual prosodic curve of the speech units. The selected speech units are modified such that a new prosodic curve of the modified speech units matches the desired prosodic curve. The system stores the modified speech units into the speech unit database for use in generating future speech, thereby increasing the prosodic coverage of the database with the expectation of improving the output quality.

    SYSTEM AND METHOD FOR TEXT NORMALIZATION USING ATOMIC TOKENS

    公开(公告)号:US20210272549A1

    公开(公告)日:2021-09-02

    申请号:US17306887

    申请日:2021-05-03

    IPC分类号: G10L13/10

    摘要: A system, method and computer-readable storage devices are for normalizing text for ASR and TTS in a language-neutral way. The system described herein divides Unicode text into meaningful chunks called “atomic tokens.” The atomic tokens strongly correlate to their actual pronunciation, and not to their meaning. The system combines the tokenization with a data-driven classification scheme, followed by class-determined actions to convert text to normalized form. The classification labels are based on pronunciation, unlike alternative approaches that typically employ Named Entity-based categories. Thus, this approach is relatively simple to adapt to new languages. Non-experts can easily annotate training data because the tokens are based on pronunciation alone.

    System and method for text normalization using atomic tokens

    公开(公告)号:US10997964B2

    公开(公告)日:2021-05-04

    申请号:US16542514

    申请日:2019-08-16

    IPC分类号: G10L13/00 G10L13/10

    摘要: A system, method and computer-readable storage devices are for normalizing text for ASR and TTS in a language-neutral way. The system described herein divides Unicode text into meaningful chunks called “atomic tokens.” The atomic tokens strongly correlate to their actual pronunciation, and not to their meaning. The system combines the tokenization with a data-driven classification scheme, followed by class-determined actions to convert text to normalized form. The classification labels are based on pronunciation, unlike alternative approaches that typically employ Named Entity-based categories. Thus, this approach is relatively simple to adapt to new languages. Non-experts can easily annotate training data because the tokens are based on pronunciation alone.

    System and method for unified normalization in text-to-speech and automatic speech recognition

    公开(公告)号:US10199034B2

    公开(公告)日:2019-02-05

    申请号:US14461930

    申请日:2014-08-18

    摘要: A system, method and computer-readable storage devices are for using a single set of normalization protocols and a single language lexica (or dictionary) for both TTS and ASR. The system receives input (which is either text to be converted to speech or ASR training text), then normalizes the input. The system produces, using the normalized input and a dictionary configured for both automatic speech recognition and text-to-speech processing, output which is either phonemes corresponding to the input or text corresponding to the input for training the ASR system. When the output is phonemes corresponding to the input, the system generates speech by performing prosody generation and unit selection synthesis using the phonemes. When the output is text corresponding to the input, the system trains both an acoustic model and a language model for use in future speech recognition.

    SYSTEM AND METHOD FOR PROSODICALLY MODIFIED UNIT SELECTION DATABASES

    公开(公告)号:US20200227023A1

    公开(公告)日:2020-07-16

    申请号:US16828070

    申请日:2020-03-24

    摘要: Systems, methods, and computer-readable storage devices to improve the quality of synthetic speech generation. A system selects speech units from a speech unit database, the speech units corresponding to text to be converted to speech. The system identifies a desired prosodic curve of speech produced from the selected speech units, and also identifies an actual prosodic curve of the speech units. The selected speech units are modified such that a new prosodic curve of the modified speech units matches the desired prosodic curve. The system stores the modified speech units into the speech unit database for use in generating future speech, thereby increasing the prosodic coverage of the database with the expectation of improving the output quality.

    System and method for text normalization using atomic tokens

    公开(公告)号:US10388270B2

    公开(公告)日:2019-08-20

    申请号:US14533589

    申请日:2014-11-05

    IPC分类号: G10L13/10

    摘要: A system, method and computer-readable storage devices are for normalizing text for ASR and TTS in a language-neutral way. The system described herein divides Unicode text into meaningful chunks called “atomic tokens.” The atomic tokens strongly correlate to their actual pronunciation, and not to their meaning The system combines the tokenization with a data-driven classification scheme, followed by class-determined actions to convert text to normalized form. The classification labels are based on pronunciation, unlike alternative approaches that typically employ Named Entity-based categories. Thus, this approach is relatively simple to adapt to new languages. Non-experts can easily annotate training data because the tokens are based on pronunciation alone.