REPROGRAMMABLE FEDERATED LEARNING
    93.
    发明公开

    公开(公告)号:US20240256894A1

    公开(公告)日:2024-08-01

    申请号:US18162894

    申请日:2023-02-01

    CPC classification number: G06N3/098

    Abstract: Systems and techniques that facilitate reprogrammable federated learning are provided. In various embodiments, a server device can share a pre-trained and frozen neural network with a set of client devices. In various aspects, the server device can orchestrate reprogrammable federated learning of the pre-trained and frozen neural network among the set of client devices. In various instances, the pre-trained and frozen neural network can be positioned between at least one trainable input layer and at least one trainable output layer, and the reprogrammable federated learning can involve the at least one trainable input layer and the at least one trainable output layer, but not the pre-trained and frozen neural network, being locally adjusted by the set of client devices.

    Counterfactual debiasing inference for compositional action recognition

    公开(公告)号:US12020480B2

    公开(公告)日:2024-06-25

    申请号:US17662663

    申请日:2022-05-10

    CPC classification number: G06V20/41 G06V10/806 G06V20/46

    Abstract: One or more computer processors improve action recognition by removing inference introduced by visual appearances of objects within a received video segment. The one or more computer processors extract appearance information and structure information from a received video segment. The one or more computer processors calculate a factual inference (TE) for the received video segment utilizing the extracted appearance information and structure information. The one or more computer processors calculate a counterfactual debiasing inference (NDE) for the received video segment. The one or more computer processors calculate a total indirect effect (TIE) by subtracting the calculated counterfactual debiased inference from the calculated factual inference. The one or more computer processors action recognize the received video segment by selecting a classification result associated with a highest calculated TIE.

    Generative adversarial network based audio restoration

    公开(公告)号:US12001950B2

    公开(公告)日:2024-06-04

    申请号:US16299828

    申请日:2019-03-12

    CPC classification number: G06N3/08 G06N3/088 G10L21/0208

    Abstract: Mechanisms are provided for implementing a generative adversarial network (GAN) based restoration system. A first neural network of a generator of the GAN based restoration system is trained to generate an artificial audio spectrogram having a target damage characteristic based on an input audio spectrogram and a target damage vector. An original audio recording spectrogram is input to the trained generator, where the original audio recording spectrogram corresponds to an original audio recording and an input target damage vector. The trained generator processes the original audio recording spectrogram to generate an artificial audio recording spectrogram having a level of damage corresponding to the input target damage vector. A spectrogram inversion module converts the artificial audio recording spectrogram to an artificial audio recording waveform output.

    Temporal Action Localization with Mutual Task Guidance

    公开(公告)号:US20240037940A1

    公开(公告)日:2024-02-01

    申请号:US17875566

    申请日:2022-07-28

    CPC classification number: G06V20/41 G06V20/46

    Abstract: A computer vision temporal action localization (TAL) computing tool and operations are provided. The TAL computing tool receives a coarse temporal bounding box, having a first start point and a first end point, for an action in the input video data, and a first set of logits, where each logit corresponds to a potential classification of the action in the input video data. The TAL computing tool executes a first engine on the coarse temporal bounding box to generate a second set of logits, and a second engine on the first set of logits to generate a refined temporal bounding box having a second start point and a second end point. The TAL computing tool performs the computer vision temporal action localization operation based on the second set of logits and the refined temporal bounding box to specify a temporal segment of the input video data corresponding to an action represented in the input video data, and a corresponding classification of the action represented in the temporal segment.

    SELF-SUPERVISED SPEECH RECOGNITION
    97.
    发明公开

    公开(公告)号:US20230360642A1

    公开(公告)日:2023-11-09

    申请号:US17662435

    申请日:2022-05-09

    CPC classification number: G10L15/16 G06N3/082 G10L15/01

    Abstract: One or more computer processors obtain an initial subnetwork at a target sparsity and an initial pruning mask from a pre-trained self-supervised learning (SSL) speech model. The one or more computer processors finetune the initial subnetwork, comprising: the one or more computer processors zero out one or more masked weights in the initial subnetwork specified by the initial pruning mask; the one or more computer processors train a new subnetwork from the zeroed out subnetwork; the one or more computer processors prune one or more weights of lowest magnitude in the new subnetwork regardless of network structure to satisfy the target sparsity. The one or more computer processors classify an audio segment with the finetuned subnetwork.

    Dual-modality relation networks for audio-visual event localization

    公开(公告)号:US11663823B2

    公开(公告)日:2023-05-30

    申请号:US16989387

    申请日:2020-08-10

    Abstract: Dual-modality relation networks for audio-visual event localization can be provided. A video feed for audio-visual event localization can be received. Based on a combination of extracted audio features and video features of the video feed, informative features and regions in the video feed can be determined by running a first neural network. Based on the informative features and regions in the video feed determined by the first neural network, relation-aware video features can be determined by running a second neural network. Based on the informative features and regions in the video feed, relation-aware audio features can be determined by running a third neural network. A dual-modality representation can be obtained based on the relation-aware video features and the relation-aware audio features by running a fourth neural network. The dual-modality representation can be input to a classifier to identity an audio-visual event in the video feed.

    TRANSFORMERS FOR REAL WORLD VIDEO QUESTION ANSWERING

    公开(公告)号:US20230136515A1

    公开(公告)日:2023-05-04

    申请号:US17516119

    申请日:2021-11-01

    Abstract: A processor may receive a video including a plurality of video frames in sequence and a question regarding the video. For a video frame in the plurality of video frames, a processor may parse the video frame into objects and relationships between the objects, and create a subgraph of nodes representing objects and edges representing the relationships, where parsing and creating are performed for each video frame in the plurality of video frames, where a plurality of subgraphs can be created. A processor may create a hypergraph connecting subgraphs by learning relationships between the nodes of the subgraphs, where a hyper-edge is created to represent a relationship between at least one node of one subgraph and at least one node of another subgraph in the plurality of subgraphs. A processor may generate an answer to the question based on the hypergraph.

Patent Agency Ranking