Speech recognition using convolutional neural networks

    公开(公告)号:US11069345B2

    公开(公告)日:2021-07-20

    申请号:US16719424

    申请日:2019-12-18

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing speech recognition by generating a neural network output from an audio data input sequence, where the neural network output characterizes words spoken in the audio data input sequence. One of the methods includes, for each of the audio data inputs, providing a current audio data input sequence that comprises the audio data input and the audio data inputs preceding the audio data input in the audio data input sequence to a convolutional subnetwork comprising a plurality of dilated convolutional neural network layers, wherein the convolutional subnetwork is configured to, for each of the plurality of audio data inputs: receive the current audio data input sequence for the audio data input, and process the current audio data input sequence to generate an alternative representation for the audio data input.

    HIGH FIDELITY SPEECH SYNTHESIS WITH ADVERSARIAL NETWORKS

    公开(公告)号:US20210089909A1

    公开(公告)日:2021-03-25

    申请号:US17032578

    申请日:2020-09-25

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output audio examples using a generative neural network. One of the methods includes obtaining a training conditioning text input; processing a training generative input comprising the training conditioning text input using a feedforward generative neural network to generate a training audio output; processing the training audio output using each of a plurality of discriminators, wherein the plurality of discriminators comprises one or more conditional discriminators and one or more unconditional discriminators; determining a first combined prediction by combining the respective predictions of the plurality of discriminators; and determining an update to current values of a plurality of generative parameters of the feedforward generative neural network to increase a first error in the first combined prediction.

    SPATIAL TRANSFORMER MODULES
    26.
    发明申请

    公开(公告)号:US20210034909A1

    公开(公告)日:2021-02-04

    申请号:US16995307

    申请日:2020-08-17

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing inputs using an image processing neural network system that includes a spatial transformer module. One of the methods includes receiving an input feature map derived from the one or more input images, and applying a spatial transformation to the input feature map to generate a transformed feature map, comprising: processing the input feature map to generate spatial transformation parameters for the spatial transformation, and sampling from the input feature map in accordance with the spatial transformation parameters to generate the transformed feature map.

    Speech recognition using convolutional neural networks

    公开(公告)号:US10586531B2

    公开(公告)日:2020-03-10

    申请号:US16209661

    申请日:2018-12-04

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing speech recognition by generating a neural network output from an audio data input sequence, where the neural network output characterizes words spoken in the audio data input sequence. One of the methods includes, for each of the audio data inputs, providing a current audio data input sequence that comprises the audio data input and the audio data inputs preceding the audio data input in the audio data input sequence to a convolutional subnetwork comprising a plurality of dilated convolutional neural network layers, wherein the convolutional subnetwork is configured to, for each of the plurality of audio data inputs: receive the current audio data input sequence for the audio data input, and process the current audio data input sequence to generate an alternative representation for the audio data input.

    SPATIAL TRANSFORMER MODULES
    29.
    发明申请

    公开(公告)号:US20180330185A1

    公开(公告)日:2018-11-15

    申请号:US16041567

    申请日:2018-07-20

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing inputs using an image processing neural network system that includes a spatial transformer module. One of the methods includes receiving an input feature map derived from the one or more input images, and applying a spatial transformation to the input feature map to generate a transformed feature map, comprising: processing the input feature map to generate spatial transformation parameters for the spatial transformation, and sampling from the input feature map in accordance with the spatial transformation parameters to generate the transformed feature map.

    TRAINING ACTION SELECTION NEURAL NETWORKS USING LOOK-AHEAD SEARCH

    公开(公告)号:US20250148282A1

    公开(公告)日:2025-05-08

    申请号:US18919108

    申请日:2024-10-17

    Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media, for training an action selection neural network. One of the methods includes receiving an observation characterizing a current state of the environment; determining a target network output for the observation by performing a look ahead search of possible future states of the environment starting from the current state until the environment reaches a possible future state that satisfies one or more termination criteria, wherein the look ahead search is guided by the neural network in accordance with current values of the network parameters; selecting an action to be performed by the agent in response to the observation using the target network output generated by performing the look ahead search; and storing, in an exploration history data store, the target network output in association with the observation for use in updating the current values of the network parameters.

Patent Agency Ranking