-
公开(公告)号:US11538244B2
公开(公告)日:2022-12-27
申请号:US16642660
申请日:2018-06-22
Applicant: Microsoft Technology Licensing, LLC
Abstract: Implementations of the subject matter described herein provide a solution for extracting spatial-temporal feature representation. In this solution, an input comprising a plurality of images is received at a first layer of a learning network. First features that characterize spatial presentation of the images are extracted from the input in a spatial dimension using a first unit of the first layer. Based on a type of a connection between the first unit and a second unit of the first layer, second features at least characterizing temporal changes across the images are extracted from the first features and/or the input in a temporal dimension using the second unit. A spatial-temporal feature representation of the images is generated partially based on the second features. Through this solution, it is possible to reduce learning network sizes, improve training and use efficiency of learning networks, and obtain accurate spatial-temporal feature representations.
-
公开(公告)号:US20170109584A1
公开(公告)日:2017-04-20
申请号:US14887629
申请日:2015-10-20
Applicant: Microsoft Technology Licensing, LLC
IPC: G06K9/00 , G11B27/30 , G11B27/031
CPC classification number: G06K9/00718 , G06K9/00751 , G11B27/031 , G11B27/3081 , H04N21/45457 , H04N21/4666 , H04N21/8549
Abstract: Video highlight detection using pairwise deep ranking neural network training is described. In some examples, highlights in a video are discovered, then used for generating summarization of videos, such as first-person videos. A pairwise deep ranking model is employed to learn the relationship between previously identified highlight and non-highlight video segments. This relationship is encapsulated in a neural network. An example two stream process generates highlight scores for each segment of a user's video. The obtained highlight scores are used to summarize highlights of the user's video.
-
公开(公告)号:US09807473B2
公开(公告)日:2017-10-31
申请号:US14946988
申请日:2015-11-20
Applicant: Microsoft Technology Licensing, LLC
IPC: H04N5/445 , H04N21/8405 , G06F17/27 , G06K9/00 , G06N3/08
CPC classification number: H04N21/8405 , G06F17/274 , G06F17/2785 , G06K9/00718 , G06K9/6273 , G06N3/08 , H04N21/26603
Abstract: Video description generation using neural network training based on relevance and coherence is described. In some examples, long short-term memory with visual-semantic embedding (LSTM-E) can maximize the probability of generating the next word given previous words and visual content and can create a visual-semantic embedding space for enforcing the relationship between the semantics of an entire sentence and visual content. LSTM-E can include a 2-D and/or 3-D deep convolutional neural networks for learning powerful video representation, a deep recurrent neural network for generating sentences, and a joint embedding model for exploring the relationships between visual content and sentence semantics.
-
公开(公告)号:US20170150235A1
公开(公告)日:2017-05-25
申请号:US14946988
申请日:2015-11-20
Applicant: Microsoft Technology Licensing, LLC
IPC: H04N21/8405 , G06K9/00 , G06N3/08 , G06F17/27
CPC classification number: H04N21/8405 , G06F17/274 , G06F17/2785 , G06K9/00718 , G06K9/6273 , G06N3/08 , H04N21/26603
Abstract: Video description generation using neural network training based on relevance and coherence is described. In some examples, long short-term memory with visual-semantic embedding (LSTM-E) can maximize the probability of generating the next word given previous words and visual content and can create a visual-semantic embedding space for enforcing the relationship between the semantics of an entire sentence and visual content. LSTM-E can include a 2-D and/or 3-D deep convolutional neural networks for learning powerful video representation, a deep recurrent neural network for generating sentences, and a joint embedding model for exploring the relationships between visual content and sentence semantics.
-
-
-