System and method for standardized speech recognition infrastructure
    21.
    发明授权
    System and method for standardized speech recognition infrastructure 有权
    标准语音识别基础设施的系统和方法

    公开(公告)号:US09053704B2

    公开(公告)日:2015-06-09

    申请号:US14330739

    申请日:2014-07-14

    CPC classification number: G10L15/075 G10L15/063 G10L15/065 G10L15/07 G10L15/08

    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure.

    Abstract translation: 这里公开了用于在标准化语音识别基础设施中选择语音识别模型的系统,方法和计算机可读存储介质。 系统从用户接收语音,并且如果与用户相关联的用户特定的监督语音模型可用,则检索监督的语音模型。 如果用户特定的监督语音模型不可用,并且如果无人监督的语音模型可用,则系统检索无监督语音模型。 如果用户特定的监督语音模型和无监督语音模型不可用,则系统检索与用户相关联的通用语音模型。 接下来,系统使用所检索的模型识别来自用户的接收到的语音。 在一个实施例中,系统在标准化语音识别基础设施中训练语音识别模型。 在另一个实施例中,系统与标准语音识别基础设施中的远程应用握手。

    Utterance endpointing in task-oriented conversational systems

    公开(公告)号:US12243517B1

    公开(公告)日:2025-03-04

    申请号:US17500834

    申请日:2021-10-13

    Abstract: A task-oriented dialog system determines an endpoint in a user utterance by receiving incremental portions of a user utterance that is provided in real time during a task-oriented communication session between a user and a virtual agent (VA). The task-oriented dialog system recognizes words in the incremental portions using an automated speech recognition (ASR) model and generates semantic information for the incremental portions of the utterance by applying a natural language processing (NLP) model to the recognized words. An acoustic-prosodic signature of the incremental portions of the utterance is generated using an acoustic-prosodic model. The task-oriented dialog system can generate a feature vector that represents the incrementally recognized words, the semantic information, the acoustic-prosodic signature, and corresponding confidence scores of the model outputs. A model is applied to the feature vector to identify a likely endpoint in the user utterance.

    Annotating and modeling natural language semantics through annotation conversion

    公开(公告)号:US12154552B1

    公开(公告)日:2024-11-26

    申请号:US17462889

    申请日:2021-08-31

    Abstract: A natural language understanding (NLU) system generates in-place annotations for natural language utterances or other types of time-based media based on stand-off annotations. The in-place annotations are associated with particular sub-sequences of an annotation, which provides richer information than stand-off annotations, which are associated only with an utterance as a whole. To generate the in-place annotations for an utterance, the NLU system applies an encoder network and a decoder network to obtain attention weights for the various tokens within the utterance. The NLU system disqualifies tokens of the utterance based on their corresponding attention weights, and selects highest-scoring contiguous sequences of tokens between the disqualified tokens. In-place annotations are associated with the selected sequences.

    Extracting natural language semantics from speech without the use of speech recognition

    公开(公告)号:US11508355B1

    公开(公告)日:2022-11-22

    申请号:US16172115

    申请日:2018-10-26

    Abstract: Systems and methods are disclosed herein for discerning aspects of user speech to determine user intent and/or other acoustic features of a sound input without the use of an ASR engine. To this end, a processor may receive a sound signal comprising raw acoustic data from a client device, and divides the data into acoustic units. The processor feeds the acoustic units through a first machine learning model to obtain a first output and determines a first mapping, using the first output, of each respective acoustic unit to a plurality of candidate representations of the respective acoustic unit. The processor feeds each candidate representation of the plurality through a second machine learning model to obtain a second output, determines a second mapping, using the second output, of each candidate representation to a known condition, and determines a label for the sound signal based on the second mapping.

    Underspecification of intents in a natural language processing system

    公开(公告)号:US10216832B2

    公开(公告)日:2019-02-26

    申请号:US15384275

    申请日:2016-12-19

    Abstract: A natural language processing system has a hierarchy of user intents related to a domain of interest, the hierarchy having specific intents corresponding to leaf nodes of the hierarchy, and more general intents corresponding to ancestor nodes of the leaf nodes. The system also has a trained understanding model that can classify natural language utterances according to user intent. When the understanding model cannot determine with sufficient confidence that a natural language utterance corresponds to one of the specific intents, the natural language processing system traverses the hierarchy of intents to find a more general user intent that is related to the most applicable specific intent of the utterance and for which there is sufficient confidence. The general intent can then be used to prompt the user with questions applicable to the general intent to obtain the missing information needed for a specific intent.

    HIERARCHICAL SPEECH RECOGNITION DECODER
    26.
    发明申请

    公开(公告)号:US20190035389A1

    公开(公告)日:2019-01-31

    申请号:US16148884

    申请日:2018-10-01

    CPC classification number: G10L15/197 G10L15/02 G10L15/063 G10L2015/0631

    Abstract: A speech interpretation module interprets the audio of user utterances as sequences of words. To do so, the speech interpretation module parameterizes a literal corpus of expressions by identifying portions of the expressions that correspond to known concepts, and generates a parameterized statistical model from the resulting parameterized corpus. When speech is received the speech interpretation module uses a hierarchical speech recognition decoder that uses both the parameterized statistical model and language sub-models that specify how to recognize a sequence of words. The separation of the language sub-models from the statistical model beneficially reduces the size of the literal corpus needed for training, reduces the size of the resulting model, provides more fine-grained interpretation of concepts, and improves computational efficiency by allowing run-time incorporation of the language sub-models.

    Hierarchical speech recognition decoder

    公开(公告)号:US10096317B2

    公开(公告)日:2018-10-09

    申请号:US15131833

    申请日:2016-04-18

    Abstract: A speech interpretation module interprets the audio of user utterances as sequences of words. To do so, the speech interpretation module parameterizes a literal corpus of expressions by identifying portions of the expressions that correspond to known concepts, and generates a parameterized statistical model from the resulting parameterized corpus. When speech is received the speech interpretation module uses a hierarchical speech recognition decoder that uses both the parameterized statistical model and language sub-models that specify how to recognize a sequence of words. The separation of the language sub-models from the statistical model beneficially reduces the size of the literal corpus needed for training, reduces the size of the resulting model, provides more fine-grained interpretation of concepts, and improves computational efficiency by allowing run-time incorporation of the language sub-models.

    System and method for pronunciation modeling
    30.
    发明授权
    System and method for pronunciation modeling 有权
    发音建模的系统和方法

    公开(公告)号:US09431011B2

    公开(公告)日:2016-08-30

    申请号:US14488844

    申请日:2014-09-17

    CPC classification number: G10L15/187 G10L15/183 G10L2015/025

    Abstract: Systems, computer-implemented methods, and tangible computer-readable media for generating a pronunciation model. The method includes identifying a generic model of speech composed of phonemes, identifying a family of interchangeable phonemic alternatives for a phoneme in the generic model of speech, labeling the family of interchangeable phonemic alternatives as referring to the same phoneme, and generating a pronunciation model which substitutes each family for each respective phoneme. In one aspect, the generic model of speech is a vocal tract length normalized acoustic model. Interchangeable phonemic alternatives can represent a same phoneme for different dialectal classes. An interchangeable phonemic alternative can include a string of phonemes.

    Abstract translation: 系统,计算机实现的方法和用于生成发音模型的有形计算机可读介质。 该方法包括识别由音素组成的通用语音模型,在通用语音模型中识别音素的可互换音素替代品系列,将可互换音素替代品的家族标记为指相同的音素,以及生成发音模型,其中 将每个家庭的每个音素替代。 在一个方面,语音的通用模型是声道长度归一化声学模型。 可互换的音素替代品可以代表不同方言课程的相同音素。 可互换的音素替代品可以包括一串音素。

Patent Agency Ranking