DIRECTION BASED END-POINTING FOR SPEECH RECOGNITION

    公开(公告)号:US20220059124A1

    公开(公告)日:2022-02-24

    申请号:US17340431

    申请日:2021-06-07

    Abstract: A speech recognition system utilizing automatic speech recognition techniques such as end-pointing techniques in conjunction with beamforming and/or signal processing to isolate speech from one or more speaking users from multiple received audio signals and to detect the beginning and/or end of the speech based at least in part on the isolation. Audio capture devices such as microphones may be arranged in a beamforming array to receive the multiple audio signals. Multiple audio sources including speech may be identified in different beams and processed.

    Direction based end-pointing for speech recognition

    公开(公告)号:US10102850B1

    公开(公告)日:2018-10-16

    申请号:US13775954

    申请日:2013-02-25

    Abstract: A speech recognition system utilizing automatic speech recognition techniques such as end-pointing techniques in conjunction with beamforming and/or signal processing to isolate speech from one or more speaking users from multiple received audio signals and to detect the beginning and/or end of the speech based at least in part on the isolation. Audio capture devices such as microphones may be arranged in a beamforming array to receive the multiple audio signals. Multiple audio sources including speech may be identified in different beams and processed.

    Rule-based presentation of media items

    公开(公告)号:US09996148B1

    公开(公告)日:2018-06-12

    申请号:US13786254

    申请日:2013-03-05

    CPC classification number: G06F3/01

    Abstract: Features are disclosed for presenting multiple media items based on one or more rules defining how the items are to be presented. One media item may be presented, and during presentation any number of additional media items may be received or scheduled for presentation. Rules may define which media items have priority over others, which media items may interrupt others or be interrupted, which media items may be delayed or presented early, whether particular media items are time-critical such that they are not to be delayed but rather should take presentation priority over others, etc. Metadata may be associated with particular media items or categories thereof. The metadata can provide details regarding how the rules should be applied to those media items. User feedback may also be obtained, and may affect the further application of the rules.

    Error reduction in speech processing

    公开(公告)号:US09697827B1

    公开(公告)日:2017-07-04

    申请号:US13711478

    申请日:2012-12-11

    CPC classification number: G10L15/18 G10L15/14 G10L15/19

    Abstract: Features are disclosed for reducing errors in speech recognition processing. Methods for reducing errors can include receiving multiple speech recognition hypotheses based on an utterance indicative of a command or query of a user and determining a command or query within a grammar having a least amount of difference from one of the speech recognition hypotheses. The determination of the least amount of difference may be based at least in part on a comparison of individual subword units along at least some of the sequence paths of the speech recognition hypotheses and the grammar. For example, the comparison may be performed on the phoneme level instead of the word level.

    Predicting pronunciation in speech recognition

    公开(公告)号:US10339920B2

    公开(公告)日:2019-07-02

    申请号:US14196055

    申请日:2014-03-04

    Abstract: An automatic speech recognition (ASR) device may be configured to predict pronunciations of textual identifiers (for example, song names, etc.) based on predicting one or more languages of origin of the textual identifier. The one or more languages of origin may be determined based on the textual identifier. The pronunciations may include a hybrid pronunciation including a pronunciation in one language, a pronunciation in a second language and a hybrid pronunciation that combines multiple languages. The pronunciations may be added to a lexicon and matched to the content item (e.g., song) and/or textual identifier. The ASR device may receive a spoken utterance from a user requesting the ASR device to access the content item. The ASR device determines whether the spoken utterance matches one of the pronunciations of the content item in the lexicon. The ASR device then accesses the content when the spoken utterance matches one of the potential textual identifier pronunciations.

Patent Agency Ranking