DUAL MODE SPEECH RECOGNITION
    91.
    发明申请

    公开(公告)号:US20180358019A1

    公开(公告)日:2018-12-13

    申请号:US15619304

    申请日:2017-06-09

    Abstract: A dual mode speech recognition system sends speech to two or more speech recognizers. If a first recognition result is received, whose recognition score exceeds a high threshold, the first result is selected without waiting for another result. If the score is below a low threshold, the first result is ignored. At intermediate values of recognition scores, a timeout duration is dynamically determined as a function of the recognition score. The timeout duration determines how long the system will wait for another result. Many functions of the recognition score are possible, but timeout durations generally decrease as scores increase. When receiving a second recognition score before the timeout occurs, a comparison based on recognition scores determines whether the first result or the second result is the basis for creating a response.

    PRONUNCIATION GUIDED BY AUTOMATIC SPEECH RECOGNITION

    公开(公告)号:US20180190269A1

    公开(公告)日:2018-07-05

    申请号:US15439883

    申请日:2017-02-22

    CPC classification number: G10L15/26 G09B5/04 G09B19/06 G10L13/00 G10L2015/225

    Abstract: Speech synthesis chooses pronunciations of words with multiple acceptable pronunciations based on an indication of a personal, class-based, or global preference or an intended non-preferred pronunciation. A speaker's words can be parroted back on personal devices using preferred pronunciations for accent training. Degrees of pronunciation error are computed and indicated to the user in a visual transcription or audibly as word emphasis in parroted speech. Systems can use sets of phonemes extended beyond those generally recognized for a language. Speakers are classified in order to choose specific phonetic dictionaries or adapt global ones. User profiles maintain lists of which pronunciations are preferred among ones acceptable for words with multiple recognized pronunciations. Systems use multiple correlations of word preferences across users to predict use preferences of unlisted words. Speaker-preferred pronunciations are used to weight the scores of transcription hypotheses based on phoneme sequence hypotheses in speech engines.

    System and methods for offline audio recognition

    公开(公告)号:US09619560B1

    公开(公告)日:2017-04-11

    申请号:US14884650

    申请日:2015-10-15

    CPC classification number: G06F17/30743 G10L15/08 G10L25/54

    Abstract: In one implementation, a method is described of retrying matching of an audio query against audio references. The method includes receiving a follow-up query that requests a retry at matching a previously submitted audio query. In some implementations, this follow-up query is received without any recognition hint that suggests how to retry matching. The follow-up query includes the audio query or a reference to the audio query to be used in the retry. The method further includes retrying matching the audio query using retry matching resources that include an expanded group of audio references, identifying at least one match and transmitting a report of the match. Optionally, the method includes storing data that correlates the follow-up query, the audio query or the reference to the audio query, and the match after retrying.

    System and Methods for Continuous Audio Matching
    96.
    发明申请
    System and Methods for Continuous Audio Matching 审中-公开
    用于连续音频匹配的系统和方法

    公开(公告)号:US20160292266A1

    公开(公告)日:2016-10-06

    申请号:US15182300

    申请日:2016-06-14

    Abstract: The present invention relates to the continuous monitoring of an audio signal and identification of audio items within an audio signal. The technology disclosed utilizes predictive caching of fingerprints to improve efficiency. Fingerprints are cached for tracking an audio signal with known alignment and for watching an audio signal without known alignment, based on already identified fingerprints extracted from the audio signal. Software running on a smart phone or other battery-powered device cooperates with software running on an audio identification server.

    Abstract translation: 本发明涉及音频信号的连续监视和音频信号内的音频项目的识别。 所公开的技术利用指纹的预测性缓存来提高效率。 基于从音频信号提取的已经识别的指纹,缓存指纹用于跟踪具有已知对准的音频信号并且用于观看没有已知对准的音频信号。 在智能手机或其他电池供电设备上运行的软件与在音频识别服务器上运行的软件配合使用。

    Token confidence scores for automatic speech recognition

    公开(公告)号:US12223948B2

    公开(公告)日:2025-02-11

    申请号:US17649810

    申请日:2022-02-03

    Abstract: Methods and systems for correction of a likely erroneous word in a speech transcription are disclosed. By evaluating token confidence scores of individual words or phrases, the automatic speech recognition system can replace a low-confidence score word with a substitute word or phrase. Among various approaches, neural network models can be used to generate individual confidence scores. Such word substitution can enable the speech recognition system to automatically detect and correct likely errors in transcription. Furthermore, the system can indicate the token confidence scores on a graphic user interface for labeling and dictionary enhancement.

    Machine learning system for digital assistants

    公开(公告)号:US12067006B2

    公开(公告)日:2024-08-20

    申请号:US17350294

    申请日:2021-06-17

    CPC classification number: G06F16/2425 G06N3/045 G06N3/088

    Abstract: A machine learning system for a digital assistant is described, together with a method of training such a system. The machine learning system is based on an encoder-decoder sequence-to-sequence neural network architecture trained to map input sequence data to output sequence data, where the input sequence data relates to an initial query and the output sequence data represents canonical data representation for the query. The method of training involves generating a training dataset for the machine learning system. The method involves clustering vector representations of the query data samples to generate canonical-query original-query pairs in training the machine learning system.

Patent Agency Ranking