Two-stage online detection of action start in untrimmed videos

    公开(公告)号:US11232308B2

    公开(公告)日:2022-01-25

    申请号:US16394964

    申请日:2019-04-25

    Abstract: Embodiments described herein provide a two-stage online detection of action start system including a classification module and a localization module. The classification module generates a set of action scores corresponding to a first video frame from the video, based on the first video frame and video frames before the first video frames in the video. Each action score indicating a respective probability that the first video frame contains a respective action class. The localization module is coupled to the classification module for receiving the set of action scores from the classification module and generating an action-agnostic start probability that the first video frame contains an action start. A fusion component is coupled to the localization module and the localization module for generating, based on the set of action scores and the action-agnostic start probability, a set of action-specific start probabilities, each action-specific start probability corresponding to a start of an action belonging to the respective action class.

    SYSTEMS AND METHODS FOR PARTIALLY SUPERVISED ONLINE ACTION DETECTION IN UNTRIMMED VIDEOS

    公开(公告)号:US20210357687A1

    公开(公告)日:2021-11-18

    申请号:US16931228

    申请日:2020-07-16

    Abstract: Embodiments described herein provide systems and methods for a partially supervised training model for online action detection. Specifically, the online action detection framework may include two modules that are trained jointly—a Temporal Proposal Generator (TPG) and an Online Action Recognizer (OAR). In the training phase, OAR performs both online per-frame action recognition and start point detection. At the same time, TPG generates class-wise temporal action proposals serving as noisy supervisions for OAR. TPG is then optimized with the video-level annotations. In this way, the online action detection framework can be trained with video-category labels only without pre-annotated segment-level boundary labels.

    PROPOSAL LEARNING FOR SEMI-SUPERVISED OBJECT DETECTION

    公开(公告)号:US20210216828A1

    公开(公告)日:2021-07-15

    申请号:US17080276

    申请日:2020-10-26

    Abstract: A method for generating a neural network for detecting one or more objects in images includes generating one or more self-supervised proposal learning losses based on the one or more proposal features and corresponding proposal feature predictions. One or more consistency-based proposal learning losses are generated based on noisy proposal feature predictions and the corresponding proposal predictions without noise. A combined loss is generated using the one or more self-supervised proposal learning losses and one or more consistency-based proposal learning losses. The neural network is updated based on the combined loss.

    PREDICTING USER INTENT FOR ONLINE SYSTEM ACTIONS THROUGH NATURAL LANGUAGE INFERENCE-BASED MACHINE LEARNING MODEL

    公开(公告)号:US20210142103A1

    公开(公告)日:2021-05-13

    申请号:US16718186

    申请日:2019-12-18

    Abstract: An online system that allows users to interact with it using expressions in natural language form includes an intent inference module allowing it to infer an intent represented by a user expression. The intent inference module has a set of possible intents, along with a small set of example natural language expressions known to represent that intent. When a user interacts with the system using a natural language expression for which the intent is not already known, the intent inference module applies a natural language inference model to compute scores indicating whether the user expression textually entails the various example natural language expressions. Based on the scores, the intent inference module determines an intent that is most applicable for the expression. If an intent cannot be determined with sufficient confidence, the intent inference module may further attempt to determine whether the various example natural language expressions textually entail the user expression.

    NATURAL LANGUAGE PROCESSING USING CONTEXT-SPECIFIC WORD VECTORS

    公开(公告)号:US20210073459A1

    公开(公告)日:2021-03-11

    申请号:US17027130

    申请日:2020-09-21

    Abstract: A system is provided for natural language processing. In some embodiments, the system includes an encoder for generating context-specific word vectors for at least one input sequence of words. The encoder is pre-trained using training data for performing a first natural language processing task. A neural network performs a second natural language processing task on the at least one input sequence of words using the context-specific word vectors. The first natural language process task is different from the second natural language processing task and the neural network is separately trained from the encoder. In some embodiments, the first natural processing task can be machine translation, and the second natural processing task can be one of sentiment analysis, question classification, entailment classification, and question answering

    Abstraction of text summarization
    58.
    发明授权

    公开(公告)号:US10909157B2

    公开(公告)日:2021-02-02

    申请号:US16051188

    申请日:2018-07-31

    Abstract: A system is disclosed for providing an abstractive summary of a source textual document. The system includes an encoder, a decoder, and a fusion layer. The encoder is capable of generating an encoding for the source textual document. The decoder is separated into a contextual model and a language model. The contextual model is capable of extracting words from the source textual document using the encoding. The language model is capable of generating vectors paraphrasing the source textual document based on pre-training with a training dataset. The fusion layer is capable of generating the abstractive summary of the source textual document from the extracted words and the generated vectors for paraphrasing. In some embodiments, the system utilizes a novelty metric to encourage the generation of novel phrases for inclusion in the abstractive summary.

    Sentinel gate for modulating auxiliary information in a long short-term memory (LSTM) neural network

    公开(公告)号:US10565306B2

    公开(公告)日:2020-02-18

    申请号:US15817165

    申请日:2017-11-18

    Abstract: The technology disclosed presents a novel spatial attention model that uses current hidden state information of a decoder long short-term memory (LSTM) to guide attention and to extract spatial image features for use in image captioning. The technology disclosed also presents a novel adaptive attention model for image captioning that mixes visual information from a convolutional neural network (CNN) and linguistic information from an LSTM. At each timestep, the adaptive attention model automatically decides how heavily to rely on the image, as opposed to the linguistic model, to emit the next caption word. The technology disclosed further adds a new auxiliary sentinel gate to an LSTM architecture and produces a sentinel LSTM (Sn-LSTM). The sentinel gate produces a visual sentinel at each timestep, which is an additional representation, derived from the LSTM's memory, of long and short term visual and linguistic information.

    Adaptive attention model for image captioning

    公开(公告)号:US10565305B2

    公开(公告)日:2020-02-18

    申请号:US15817161

    申请日:2017-11-17

    Abstract: The technology disclosed presents a novel spatial attention model that uses current hidden state information of a decoder long short-term memory (LSTM) to guide attention and to extract spatial image features for use in image captioning. The technology disclosed also presents a novel adaptive attention model for image captioning that mixes visual information from a convolutional neural network (CNN) and linguistic information from an LSTM. At each timestep, the adaptive attention model automatically decides how heavily to rely on the image, as opposed to the linguistic model, to emit the next caption word. The technology disclosed further adds a new auxiliary sentinel gate to an LSTM architecture and produces a sentinel LSTM (Sn-LSTM). The sentinel gate produces a visual sentinel at each timestep, which is an additional representation, derived from the LSTM's memory, of long and short term visual and linguistic information.

Patent Agency Ranking