Multilingual Re-Scoring Models for Automatic Speech Recognition

    公开(公告)号:US20240203409A1

    公开(公告)日:2024-06-20

    申请号:US18589220

    申请日:2024-02-27

    Applicant: Google LLC

    CPC classification number: G10L15/197 G10L15/005 G10L15/16 G10L15/22

    Abstract: A method includes receiving a sequence of acoustic frames extracted from audio data corresponding to an utterance. During a first pass, the method includes processing the sequence of acoustic frames to generate N candidate hypotheses for the utterance. During a second pass, and for each candidate hypothesis, the method includes: generating a respective un-normalized likelihood score; generating a respective external language model score; generating a standalone score that models prior statistics of the corresponding candidate hypothesis; and generating a respective overall score for the candidate hypothesis based on the un-normalized likelihood score, the external language model score, and the standalone score. The method also includes selecting the candidate hypothesis having the highest respective overall score from among the N candidate hypotheses as a final transcription of the utterance.

    Exponential Modeling with Deep Learning Features

    公开(公告)号:US20230186096A1

    公开(公告)日:2023-06-15

    申请号:US18161479

    申请日:2023-01-30

    Applicant: Google LLC

    CPC classification number: G06N3/084 G06N20/00 G06F18/2431

    Abstract: Aspects of the present disclosure enable humanly-specified relationships to contribute to a mapping that enables compression of the output structure of a machine-learned model. An exponential model such as a maximum entropy model can leverage a machine-learned embedding and the mapping to produce a classification output. In such fashion, the feature discovery capabilities of machine-learned models (e.g., deep networks) can be synergistically combined with relationships developed based on human understanding of the structural nature of the problem to be solved, thereby enabling compression of model output structures without significant loss of accuracy. These compressed models provide improved applicability to “on device” or other resource-constrained scenarios.

    Exponential modeling with deep learning features

    公开(公告)号:US11568260B2

    公开(公告)日:2023-01-31

    申请号:US16654425

    申请日:2019-10-16

    Applicant: Google LLC

    Abstract: Aspects of the present disclosure enable humanly-specified relationships to contribute to a mapping that enables compression of the output structure of a machine-learned model. An exponential model such as a maximum entropy model can leverage a machine-learned embedding and the mapping to produce a classification output. In such fashion, the feature discovery capabilities of machine-learned models (e.g., deep networks) can be synergistically combined with relationships developed based on human understanding of the structural nature of the problem to be solved, thereby enabling compression of model output structures without significant loss of accuracy. These compressed models provide improved applicability to “on device” or other resource-constrained scenarios.

    ENHANCED MULTI-CHANNEL ACOUSTIC MODELS

    公开(公告)号:US20210295859A1

    公开(公告)日:2021-09-23

    申请号:US17303822

    申请日:2021-06-08

    Applicant: Google LLC

    Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.

Patent Agency Ranking