TRAINING NEURAL NETWORKS USING A PRIORITIZED EXPERIENCE MEMORY

    公开(公告)号:US20250045583A1

    公开(公告)日:2025-02-06

    申请号:US18805367

    申请日:2024-08-14

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a neural network used to select actions performed by a reinforcement learning agent interacting with an environment. In one aspect, a method includes maintaining a replay memory, where the replay memory stores pieces of experience data generated as a result of the reinforcement learning agent interacting with the environment. Each piece of experience data is associated with a respective expected learning progress measure that is a measure of an expected amount of progress made in the training of the neural network if the neural network is trained on the piece of experience data. The method further includes selecting a piece of experience data from the replay memory by prioritizing for selection pieces of experience data having relatively higher expected learning progress measures and training the neural network on the selected piece of experience data.

    Adaptive visual speech recognition

    公开(公告)号:US12211488B2

    公开(公告)日:2025-01-28

    申请号:US18571553

    申请日:2022-06-15

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing video data using an adaptive visual speech recognition model. One of the methods includes receiving a video that includes a plurality of video frames that depict a first speaker: obtaining a first embedding characterizing the first speaker; and processing a first input comprising (i) the video and (ii) the first embedding using a visual speech recognition neural network having a plurality of parameters, wherein the visual speech recognition neural network is configured to process the video and the first embedding in accordance with trained values of the parameters to generate a speech recognition output that defines a sequence of one or more words being spoken by the first speaker in the video.

    BANDWIDTH EXTENSION OF INCOMING DATA USING NEURAL NETWORKS

    公开(公告)号:US20250022476A1

    公开(公告)日:2025-01-16

    申请号:US18780377

    申请日:2024-07-22

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for bandwidth extension. One of the methods includes obtaining a low-resolution version of an input, the low-resolution version of the input comprising a first number of samples at a first sample rate over a first time period; and generating, from the low-resolution version of the input, a high-resolution version of the input comprising a second, larger number of samples at a second, higher sample rate over the first time period. Generating the high-resolution version includes generating a representation of the low-resolution version of the input; processing the representation of the low-resolution version of the input through a conditioning neural network to generate a conditioning input; and processing the conditioning input using a generative neural network to generate the high-resolution version of the input.

    Reinforcement learning for active sequence processing

    公开(公告)号:US12175737B2

    公开(公告)日:2024-12-24

    申请号:US17773789

    申请日:2020-11-13

    Abstract: A system that is configured to receive a sequence of task inputs and to perform a machine learning task is described. The system includes a reinforcement learning (RL) neural network and a task neural network. The RL neural network is configured to: generate, for each task input of the sequence of task inputs, a respective decision that determines whether to encode the task input or to skip the task input, and provide the respective decision of each task input to the task neural network. The task neural network is configured to: receive the sequence of task inputs, receive, from the RL neural network, for each task input of the sequence of task inputs, a respective decision that determines whether to encode the task input or to skip the task input, process each of the un-skipped task inputs in the sequence of task inputs to generate a respective accumulated feature for the un-skipped task input, wherein the respective accumulated feature characterizes features of the un-skipped task input and of previous un-skipped task inputs in the sequence, and generate a machine learning task output for the machine learning task based on the last accumulated feature generated for the last un-skipped task input in the sequence.

    DEMONSTRATION-DRIVEN REINFORCEMENT LEARNING

    公开(公告)号:US20240412063A1

    公开(公告)日:2024-12-12

    申请号:US18698218

    申请日:2022-10-05

    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a reinforcement learning system to select actions to be performed by an agent interacting with an environment to perform a particular task. In one aspect, one of the methods includes obtaining a training sequence comprising a respective training observations at each of a plurality of time steps; obtaining demonstration data comprising one or more demonstration sequences; generating a new training sequence from the training sequence and the demonstration data; and training the goal-conditioned policy neural network on the new training sequence through reinforcement learning.

    Rating tasks and policies using conditional probability distributions derived from equilibrium-based solutions of games

    公开(公告)号:US12151171B2

    公开(公告)日:2024-11-26

    申请号:US17963113

    申请日:2022-10-10

    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for rating tasks and policies using conditional probability distributions derived from equilibrium-based solutions of games. One of the methods includes: determining, for each action selection policy in a pool of action selection policies, a respective performance measure of the action selection policy on each task in a pool of tasks, processing the performance measures of the action selection policies on the tasks to generate data defining a joint probability distribution over a set of action selection policy-task pairs, and processing the joint probability distribution over the set of action selection policy-task pairs to generate a respective rating for each action selection policy in the pool of action selection policies, where the respective rating for each action selection policy characterizes a utility of the action selection policy in performing tasks from the pool of tasks.

Patent Agency Ranking