Self-supervised audio representation learning for mobile devices

    公开(公告)号:US11501787B2

    公开(公告)日:2022-11-15

    申请号:US16548146

    申请日:2019-08-22

    Applicant: Google LLC

    Abstract: Systems and methods for training a machine-learned model are provided. A method can include can include obtaining an unlabeled audio signal, sampling the unlabeled audio signal to select one or more sampled slices, inputting the one or more sampled slices into a machine-learned model, receiving, as an output of the machine-learned model, one or more determined characteristics associated with the audio signal, determining a loss function for the machine-learned model based at least in part on a difference between the one or more determined characteristics and one or more corresponding ground truth characteristics of the audio signal, and training the machine-learned model from end to end based at least in part on the loss function. The one or more determined characteristics can include one or more reconstructed portions of the audio signal temporally adjacent to the one or more sampled slices or an estimated distance between two sampled slices.

    QUANTIZATION OF SPATIAL AUDIO DIRECTION PARAMETERS

    公开(公告)号:US20220335956A1

    公开(公告)日:2022-10-20

    申请号:US17635593

    申请日:2020-07-27

    Abstract: A method for spatial audio signal encoding comprising: obtaining, for a first frame, a plurality of audio direction parameters, wherein each parameter comprises an elevation value and an azimuth value and wherein each parameter has an ordered position; determining whether, for a preceding frame, any of the plurality of audio direction parameters was differentially encoded based on a difference between the preceding frame parameter elevation value and a further preceding frame parameter elevation value and the preceding frame parameter azimuth value and a further preceding frame parameter azimuth value; generating, for any audio direction parameter which was not differentially encoded in the considered preceding frame, a differential parameter value based on a difference between the frame parameter elevation value and a preceding frame parameter elevation value and a difference between the frame parameter azimuth value and a preceding frame parameter azimuth value; generating for each of the plurality of audio direction parameters a difference parameter value based on a difference between the audio direction parameter and a rotated derived audio direction parameter; quantizing the difference between the audio direction parameter and a rotated derived audio direction parameter and the differential parameter value; and selecting for each of the plurality of audio direction parameters, either of the quantized difference or differential parameter value.

    QUANTIZATION OF SPATIAL AUDIO DIRECTION PARAMETERS

    公开(公告)号:US20220279299A1

    公开(公告)日:2022-09-01

    申请号:US17628792

    申请日:2020-06-15

    Abstract: There is disclosed inter alia an apparatus for spatial audio signal encoding configured to derive for each of a plurality of audio direction parameters a corresponding derived audio direction parameter comprising an elevation value and an azimuth value. Each derived audio direction parameter is rotated by the azimuth value of an audio direction parameter in the first position of the plurality of audio direction parameters. The position of some of the audio direction parameters are changed followed by determining for each of the plurality audio direction parameters a difference between each audio direction parameter and a corresponding rotated derived audio direction parameter. The difference for each of the plurality of audio direction parameters is then quantised.

    ENCODING DEVICE, DECODING DEVICE, ENCODING METHOD, DECODING METHOD, AND NON-TRANSITORY COMPUTER-READABLE RECORDING MEDIUM

    公开(公告)号:US20220130402A1

    公开(公告)日:2022-04-28

    申请号:US17573360

    申请日:2022-01-11

    Abstract: An encoding device according to the disclosure includes a first encoder, which in operation, encodes a low-band signal from a voice or audio input signal to generate a first encoded signal; a decoder, which in operation, decodes the first encoded signal to generate a low-band decoded signal; a second encoder, which in operation, encodes, on the basis of the low-band decoded signal, a high-band signal comprising a band from the voice or audio input signal, the band being higher than that of the low-band signal to generate a high-band encoded signal; an energy calculator, which in operation, calculates an energy of the voice or audio input signal for each subband of a plurality of subbands of the voice or audio input signal to acquire a calculated energy for each subband of the plurality of subbands of the voice or audio input signal, quantizes the calculated energy for each subband of the plurality of subbands of the voice or audio input signal to acquire a quantized band energy for each subband of the plurality of subbands of the voice or audio input signal and outputs the quantized band energy for each subband of the plurality of subbands of the voice or audio input signal; and a multiplexer, which in operation, multiplexes the quantized band energy for each subband of the plurality of subbands of the voice or audio input signal, the first encoded signal, and the high-band encoded signal to generate and output an encoded signal.

Patent Agency Ranking