LONG DURATION STRUCTURED VIDEO ACTION SEGMENTATION

    公开(公告)号:US20240104915A1

    公开(公告)日:2024-03-28

    申请号:US18459824

    申请日:2023-09-01

    CPC classification number: G06V10/82 G06V10/751 G06V10/86 G06V20/46 G06V20/49

    Abstract: Machine learning models can process a video and generate outputs such as action segmentation assigning portions of the video to a particular action, or action classification assigning an action class for each frame of the video. Some machine learning models can accurately make predictions for short videos but may not be particularly suited for performing action segmentation for long duration, structured videos. An effective machine learning model may include a hybrid architecture involving a temporal convolutional network and a bi-directional graph neural network. The machine learning model can process long duration structured videos by using a temporal convolutional network as a first pass action segmentation model to generate rich, frame-wise features. The frame-wise features can be converted into a graph having forward edges and backward edges. A graph neural network can process the graph to refine a final fine-grain per-frame action prediction.

    PROCESSING VIDEOS BASED ON TEMPORAL STAGES

    公开(公告)号:US20230124495A1

    公开(公告)日:2023-04-20

    申请号:US18050757

    申请日:2022-10-28

    Abstract: Disclosed is a technical solution to process a video that captures actions to be performed for completing a task based on a chronological sequence of stages within the task. An example system may identify an action sequence from an instruction for the task. The system inputs the action sequence into a trained model (e.g., a recurrent neural network), which outputs the chronological sequence of stages. The RNN may be trained through self-supervised learning. The system may input the video and the chronological sequence of stages into another trained model, e.g., a temporal convolutional network. The other trained model may include hidden layers arranged before an attention layer. The hidden layers may extract features from the video and feed the features into the attention layer. The attention layer may determine attention weights of the features based on the chronological sequence of stages.

    UNCERTAINTY QUANTIFICATION FOR GENERATIVE ARTIFICIAL INTELLIGENCE MODEL

    公开(公告)号:US20250117633A1

    公开(公告)日:2025-04-10

    申请号:US18987302

    申请日:2024-12-19

    Abstract: Predictive uncertainty of a generative machine learning model may be estimated. The generative machine learning model may be a large language model or large multi-modal model. A datum may be input into the generative machine learning model. The generative machine learning model may generate outputs from the datum. Latent embeddings for the outputs may be extracted from the generative machine learning model. A covariance matrix with respect to the latent embeddings may be computed. The covariance matrix may be a two-dimensional matrix, such as a square matrix. The predictive uncertainty of the generative machine learning model may be estimated using the covariance matrix. For instance, the matrix entropy of the covariance matrix may be determined. The matrix entropy may be an approximated dimension of a latent semantic manifold spanned by the outputs of the generative machine learning model and may indicate the predictive uncertainty of the generative machine learning model.

    CALIBRATING CONFIDENCE OF CLASSIFICATION MODELS

    公开(公告)号:US20230071760A1

    公开(公告)日:2023-03-09

    申请号:US18050929

    申请日:2022-10-28

    Abstract: Disclosed is a technical solution to calibrate confidence scores of classification networks. A classification network has been trained to receive an input and output a label of the input that indicates a class of the input. The classification network also outputs a confidence score of the label, which indicates a likelihood of the input falling into the class, i.e., a confidence level of the classification network that the label is correct. To calibrate the confidence of the classification network, a logit transformation function may be added into the classification network. The logic transformation function may be an entropy-based function and have learnable parameters, which may be trained by inputting calibration samples into the classification network and optimizing a negative log likelihood based on the labels generated by the classification network and ground-truth labels of the calibration samples. The trained logic transformation function can be used to compute reliable confidence scores.

    MULTI-SCALE NEURAL NETWORK FOR ANOMALY DETECTION

    公开(公告)号:US20250111205A1

    公开(公告)日:2025-04-03

    申请号:US18978437

    申请日:2024-12-12

    Abstract: A neural network model for anomaly detection may include convolutional blocks with different spatial scales. The model may be trained with training data, which may be normal data that lacks anomaly. The convolutional blocks may generate embedding features having different spatial scales. A distance between each embedding feature and a corresponding model embedding may be determined. The distances for the embedding features may be accumulated for determining a loss of the model. The model may be trained based on the loss. An accuracy of the trained model may be tested with testing data that has verified anomaly. One or more convolutional blocks may be selected from all the convolutional blocks in the model, e.g., based on the spatial scales of the convolutional blocks and the spatial scale of data on which anomaly detection is to be performed. The selected convolutional block(s) may be used to detect anomaly in the data.

    SALIENCY MAPS AND CONCEPT FORMATION INTENSITY FOR DIFFUSION MODELS

    公开(公告)号:US20240144447A1

    公开(公告)日:2024-05-02

    申请号:US18532273

    申请日:2023-12-07

    CPC classification number: G06T5/70 G06V10/30 G06V10/32 G06V10/462

    Abstract: Deep learning models, such as diffusion models, can synthesize images from noise. Diffusion models implement a complex denoising process involving many denoising operations. It can be a challenge to understand the mechanics of diffusion models. To better understand how and when structure is formed, saliency maps and concept formation intensity can be extracted from the sampling network of a diffusion model. Using the input map and the output map of a given denoising operation in a sampling network, a noise gradient map representative of the predicted noise of a given denoising operation can be determined. The noise gradient maps from the denoising operations at different indices can be combined to generate a saliency map. A concept formation intensity value can be determined from a noise gradient map. Concept formation intensity values from the denoising operations at different indices can be plotted.

    SYSTEM AND METHOD OF USING FRACTIONAL ADAPTIVE LINEAR UNIT AS ACTIVATION IN ARTIFACIAL NEURAL NETWORK

    公开(公告)号:US20220101138A1

    公开(公告)日:2022-03-31

    申请号:US17548692

    申请日:2021-12-13

    Abstract: An apparatus is provided for deep learning. The apparatus accesses a neural network including an input layer, hidden layers, and an output layer. The apparatus adds an activation function to one or more of the hidden layers of the hidden layers and output layer. The activation function includes a tunable parameter, the value of which can be adjusted during the training of the neural network. The apparatus trains the neural network by inputting training samples into the neural network and determining internal parameters of the neural network based on the training samples. Determining the internal parameters includes determining a value of the tunable parameter based on the training samples. The apparatus may determine two different values of the tunable parameter for two different layers. The activation function may include another tunable parameter. The apparatus can determine a value for the other tunable parameter during the training of the neural network.

Patent Agency Ranking