Trained generative model speech coding

    公开(公告)号:US11978464B2

    公开(公告)日:2024-05-07

    申请号:US17757122

    申请日:2021-01-22

    申请人: GOOGLE LLC

    摘要: A method includes receiving sampled audio data corresponding to utterances and training a machine learning (ML) model, using the sampled audio data, to generate a high-fidelity audio stream from a low bitrate input bitstream. The training of the ML model includes de-emphasizing the influence of low-probability distortion events in the sampled audio data on the trained ML model, where the de-emphasizing of the distortion events is achieved by the inclusion of a term in an objective function of the ML model, which term encourages low-variance predictive distributions of a next sample in the sampled audio data, based on previous samples of the audio data.

    ADAPTIVE BLOCK SWITCHING WITH DEEP NEURAL NETWORKS

    公开(公告)号:US20230386486A1

    公开(公告)日:2023-11-30

    申请号:US18248294

    申请日:2021-10-15

    摘要: The present invention relates to a method for predicting transform coefficients representing frequency content of an adaptive block length media signal, by receiving a frame and receiving block length information indicating a number of quantized transform coefficients for each block in the frame, the number of quantized transform coefficients being one of a first or second number, wherein the first number is greater than the second number, determining a first block has the second number of quantized transform coefficients, converting the first block into a converted block having the first number of quantized transform coefficients, conditioning a main neural network trained to predict at least one output variable given at least one conditioning variable, the at least one conditioning variable being based on information regarding the converted block and block length information for the first block, providing at least one predicted transform coefficients from an output stage of the main neural network.