-
公开(公告)号:US20220019807A1
公开(公告)日:2022-01-20
申请号:US17295329
申请日:2019-11-20
Applicant: DeepMind Technologies Limited
Inventor: Joao Carreira , Carl Doersch , Andrew Zisserman
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying actions in a video. One of the methods obtaining a feature representation of a video clip; obtaining data specifying a plurality of candidate agent bounding boxes in the key video frame; and for each candidate agent bounding box: processing the feature representation through an action transformer neural network.
-
公开(公告)号:US20220004883A1
公开(公告)日:2022-01-06
申请号:US17295286
申请日:2019-11-21
Applicant: DeepMind Technologies Limited
Inventor: Yusuf Aytar , Debidatta Dwibedi , Andrew Zisserman , Jonathan Tompson , Pierre Sermanet
Abstract: An encoder neural network is described which can encode a data item, such as a frame of a video, to form a respective encoded data item. Data items of a first data sequence are associated with respective data items of a second sequence, by determining which of the encoded data items of the second sequence is closest to the encoded data item produced from each data item of the first sequence. Thus, the two data sequences are aligned. The encoder neural network is trained automatically using a training set of data sequences, by an iterative process of successively increasing cycle consistency between pairs of the data sequences.
-
公开(公告)号:US20250103856A1
公开(公告)日:2025-03-27
申请号:US18832817
申请日:2023-01-30
Applicant: DeepMind Technologies Limited
Inventor: Joao Carreira , Andrew Coulter Jaegle , Skanda Kumar Koppula , Daniel Zoran , Adrià Recasens Continente , Catalin-Dumitru Ionescu , Olivier Jean Hénaff , Evan Gerard Shelhamer , Relja Arandjelovic , Matthew Botvinick , Oriol Vinyals , Karen Simonyan , Andrew Zisserman
IPC: G06N3/045
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for using a neural network to generate a network output that characterizes an entity. In one aspect, a method includes: obtaining a representation of the entity as a set of data element embeddings, obtaining a set of latent embeddings, and processing: (i) the set of data element embeddings, and (ii) the set of latent embeddings, using the neural network to generate the network output. The neural network includes a sequence of neural network blocks including: (i) one or more local cross-attention blocks, and (ii) an output block. Each local cross-attention block partitions the set of latent embeddings and the set of data element embeddings into proper subsets, and updates each proper subset of the set of latent embeddings using attention over only the corresponding proper subset of the set of data element embeddings.
-
公开(公告)号:US20240029436A1
公开(公告)日:2024-01-25
申请号:US18375941
申请日:2023-10-02
Applicant: DeepMind Technologies Limited
Inventor: Joao Carreira , Carl Doersch , Andrew Zisserman
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying actions in a video. One of the methods obtaining a feature representation of a video clip; obtaining data specifying a plurality of candidate agent bounding boxes in the key video frame; and for each candidate agent bounding box: processing the feature representation through an action transformer neural network.
-
公开(公告)号:US20230186625A1
公开(公告)日:2023-06-15
申请号:US18108873
申请日:2023-02-13
Applicant: DeepMind Technologies Limited
Inventor: Simon Osindero , Joao Carreira , Viorica Patraucean , Andrew Zisserman
CPC classification number: G06V20/40 , G06N3/049 , G06T1/20 , G06N3/044 , G06N3/045 , G06T2200/28 , G06T2207/20084
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for parallel processing of video frames using neural networks. One of the methods includes receiving a video sequence comprising a respective video frame at each of a plurality of time steps; and processing the video sequence using a video processing neural network to generate a video processing output for the video sequence, wherein the video processing neural network includes a sequence of network components, wherein the network components comprise a plurality of layer blocks each comprising one or more neural network layers, wherein each component is active for a respective subset of the plurality of time steps, and wherein each layer block is configured to, at each time step at which the layer block is active, receive an input generated at a previous time step and to process the input to generate a block output.
-
公开(公告)号:US20220012898A1
公开(公告)日:2022-01-13
申请号:US17295321
申请日:2019-11-20
Applicant: DeepMind Technologies Limited
Inventor: Joao Carreira , Jean-Baptiste Alayrac , Andrew Zisserman
Abstract: A computer-implemented neural network system for decomposing input video data. A video data input receives a sequence of video image frames. The sequence is encoded, using a 3D spatio-temporal encoder neural network, into a set of latent variables representing a compressed version of the sequence. A 3D spatio-temporal decoder neural network processes the set of latent variables to generate two or more sets of decomposed video data; these may be stored, communicated, and/or made available to a user interface. Input video including undesired features such as reflections, shadows, and occlusions may thus be decomposed into two or more video sequences, one in which the undesired features are suppressed, and another containing the undesired features.
-
公开(公告)号:US20200372654A1
公开(公告)日:2020-11-26
申请号:US16881775
申请日:2020-05-22
Applicant: DeepMind Technologies Limited
Inventor: Simon Kohl , Bernardino Romera-Paredes , Danilo Jimenez Rezende , Seyed Mohammadali Eslami , Pushmeet Kohli , Andrew Zisserman , Olaf Ronneberger
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a plurality of possible segmentations of an image. In one aspect, a method comprises: receiving a request to generate a plurality of possible segmentations of an image; sampling a plurality of latent variables from a latent space, wherein each latent variable is sampled from the latent space in accordance with a respective probability distribution over the latent space that is determined based on the image; generating a plurality of possible segmentations of the image, comprising, for each latent variable, processing the image and the latent variable using a segmentation neural network having a plurality of segmentation neural network parameters to generate the possible segmentation of the image; and providing the plurality of possible segmentations of the image in response to the request.
-
公开(公告)号:US10789479B2
公开(公告)日:2020-09-29
申请号:US16681671
申请日:2019-11-12
Applicant: DeepMind Technologies Limited
Inventor: Joao Carreira , Andrew Zisserman
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing video data. An example system receives video data and generates optical flow data. An image sequence from the video data is provided to a first 3D spatio-temporal convolutional neural network to process the image data in at least three space-time dimensions and to provide a first convolutional neural network output. A corresponding sequence of optical flow image frames is provided to a second 3D spatio-temporal convolutional neural network to process the optical flow data in at least three space-time dimensions and to provide a second convolutional neural network output. The first and second convolutional neural network outputs are combined to provide a system output.
-
公开(公告)号:US20240303897A1
公开(公告)日:2024-09-12
申请号:US18600552
申请日:2024-03-08
Applicant: DeepMind Technologies Limited
Inventor: Carl Doersch , Yi Yang , Mel Vecerik , Dilara Gokay , Ankush Gupta , Yusuf Aytar , Joao Carreira , Andrew Zisserman
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for animating images using point trajectories.
-
公开(公告)号:US11734572B2
公开(公告)日:2023-08-22
申请号:US16995307
申请日:2020-08-17
Applicant: DeepMind Technologies Limited
Inventor: Maxwell Elliot Jaderberg , Karen Simonyan , Andrew Zisserman , Koray Kavukcuoglu
CPC classification number: G06N3/084 , G06N3/045 , G06N3/088 , G06V10/454
Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing inputs using an image processing neural network system that includes a spatial transformer module. One of the methods includes receiving an input feature map derived from the one or more input images, and applying a spatial transformation to the input feature map to generate a transformed feature map, comprising: processing the input feature map to generate spatial transformation parameters for the spatial transformation, and sampling from the input feature map in accordance with the spatial transformation parameters to generate the transformed feature map.
-
-
-
-
-
-
-
-
-