SPEECH CODING USING AUTO-REGRESSIVE GENERATIVE NEURAL NETWORKS

    公开(公告)号:US20230368804A1

    公开(公告)日:2023-11-16

    申请号:US18144413

    申请日:2023-05-08

    Applicant: Google LLC

    CPC classification number: G10L19/0204 G10L25/30

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for coding speech using neural networks. One of the methods includes obtaining a bitstream of parametric coder parameters characterizing spoken speech; generating, from the parametric coder parameters, a conditioning sequence; generating a reconstruction of the spoken speech that includes a respective speech sample at each of a plurality of decoder time steps, comprising, at each decoder time step: processing a current reconstruction sequence using an auto-regressive generative neural network, wherein the auto-regressive generative neural network is configured to process the current reconstruction to compute a score distribution over possible speech sample values, and wherein the processing comprises conditioning the auto-regressive generative neural network on at least a portion of the conditioning sequence; and sampling a speech sample from the possible speech sample values.

    IDENTIFYING SALIENT FEATURES FOR GENERATIVE NETWORKS

    公开(公告)号:US20210287038A1

    公开(公告)日:2021-09-16

    申请号:US17250506

    申请日:2019-05-16

    Applicant: Google LLC

    Abstract: Implementations identify a small set of independent, salient features from an input signal. The salient features may be used for conditioning a generative network, making the generative network robust to noise. The salient features may facilitate compression and data transmission. An example method includes receiving an input signal and extracting salient features for the input signal by providing the input signal to an encoder trained to extract salient features. The salient features may be independent and have a sparse distribution. The encoder may be configured to generate almost identical features from two input signals a system designer deems equivalent. The method also includes conditioning a generative network using the salient features. In some implementations, the method may also include extracting a plurality of time sequences from the input signal and extracting the salient features for each time sequence.

Patent Agency Ranking