GENERATING CODED DATA REPRESENTATIONS USING NEURAL NETWORKS AND VECTOR QUANTIZERS

    公开(公告)号:US20250131932A1

    公开(公告)日:2025-04-24

    申请号:US18972483

    申请日:2024-12-06

    Applicant: Google LLC

    Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. According to one aspect, there is provided a method comprising: receiving a new input; processing the new input using an encoder neural network to generate a feature vector representing the new input; and generating a coded representation of the feature vector using a sequence of vector quantizers that are each associated with a respective codebook of code vectors, wherein the coded representation of the feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector.

    Separating speech by source in audio recordings by predicting isolated audio signals conditioned on speaker representations

    公开(公告)号:US11475909B2

    公开(公告)日:2022-10-18

    申请号:US17170657

    申请日:2021-02-08

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing speech separation. One of the methods includes obtaining a recording comprising speech from a plurality of speakers; processing the recording using a speaker neural network having speaker parameter values and configured to process the recording in accordance with the speaker parameter values to generate a plurality of per-recording speaker representations, each speaker representation representing features of a respective identified speaker in the recording; and processing the per-recording speaker representations and the recording using a separation neural network having separation parameter values and configured to process the recording and the speaker representations in accordance with the separation parameter values to generate, for each speaker representation, a respective predicted isolated audio signal that corresponds to speech of one of the speakers in the recording.

    Separating speech by source in audio recordings by predicting isolated audio signals conditioned on speaker representations

    公开(公告)号:US12236970B2

    公开(公告)日:2025-02-25

    申请号:US17967726

    申请日:2022-10-17

    Applicant: Google LLC

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing speech separation. One of the methods includes obtaining a recording comprising speech from a plurality of speakers; processing the recording using a speaker neural network having speaker parameter values and configured to process the recording in accordance with the speaker parameter values to generate a plurality of per-recording speaker representations, each speaker representation representing features of a respective identified speaker in the recording; and processing the per-recording speaker representations and the recording using a separation neural network having separation parameter values and configured to process the recording and the speaker representations in accordance with the separation parameter values to generate, for each speaker representation, a respective predicted isolated audio signal that corresponds to speech of one of the speakers in the recording.

Patent Agency Ranking