Patent search ap:("Microsoft Technology Licensing Page LLC") AND inv:"Ting Yao"

1.

发明授权
Extraction of spatial-temporal feature representation 有权

公开(公告)号：US11538244B2

公开(公告)日：2022-12-27

申请号：US16642660

申请日：2018-06-22

Applicant: Microsoft Technology Licensing, LLC

Inventor： Ting Yao , Tao Mei

IPC: G06K9/62 , G06V20/40 , G06N3/08 , G06V10/40

Abstract: Implementations of the subject matter described herein provide a solution for extracting spatial-temporal feature representation. In this solution, an input comprising a plurality of images is received at a first layer of a learning network. First features that characterize spatial presentation of the images are extracted from the input in a spatial dimension using a first unit of the first layer. Based on a type of a connection between the first unit and a second unit of the first layer, second features at least characterizing temporal changes across the images are extracted from the first features and/or the input in a temporal dimension using the second unit. A spatial-temporal feature representation of the images is generated partially based on the second features. Through this solution, it is possible to reduce learning network sizes, improve training and use efficiency of learning networks, and obtain accurate spatial-temporal feature representations.

2.

发明申请
Video Highlight Detection with Pairwise Deep Ranking 审中-公开

公开(公告)号：US20170109584A1

公开(公告)日：2017-04-20

申请号：US14887629

申请日：2015-10-20

Applicant: Microsoft Technology Licensing, LLC

Inventor： Ting Yao , Tao Mei , Yong Rui

IPC: G06K9/00 , G11B27/30 , G11B27/031

CPC classification number: G06K9/00718 , G06K9/00751 , G11B27/031 , G11B27/3081 , H04N21/45457 , H04N21/4666 , H04N21/8549

Abstract: Video highlight detection using pairwise deep ranking neural network training is described. In some examples, highlights in a video are discovered, then used for generating summarization of videos, such as first-person videos. A pairwise deep ranking model is employed to learn the relationship between previously identified highlight and non-highlight video segments. This relationship is encapsulated in a neural network. An example two stream process generates highlight scores for each segment of a user's video. The obtained highlight scores are used to summarize highlights of the user's video.

3.

发明授权
Jointly modeling embedding and translation to bridge video and language 有权

公开(公告)号：US09807473B2

公开(公告)日：2017-10-31

申请号：US14946988

申请日：2015-11-20

Applicant: Microsoft Technology Licensing, LLC

Inventor： Tao Mei , Ting Yao , Yong Rui

IPC: H04N5/445 , H04N21/8405 , G06F17/27 , G06K9/00 , G06N3/08

CPC classification number: H04N21/8405 , G06F17/274 , G06F17/2785 , G06K9/00718 , G06K9/6273 , G06N3/08 , H04N21/26603

Abstract: Video description generation using neural network training based on relevance and coherence is described. In some examples, long short-term memory with visual-semantic embedding (LSTM-E) can maximize the probability of generating the next word given previous words and visual content and can create a visual-semantic embedding space for enforcing the relationship between the semantics of an entire sentence and visual content. LSTM-E can include a 2-D and/or 3-D deep convolutional neural networks for learning powerful video representation, a deep recurrent neural network for generating sentences, and a joint embedding model for exploring the relationships between visual content and sentence semantics.

4.

发明申请
Jointly Modeling Embedding and Translation to Bridge Video and Language 有权

公开(公告)号：US20170150235A1

公开(公告)日：2017-05-25

申请号：US14946988

申请日：2015-11-20

Applicant: Microsoft Technology Licensing, LLC

Inventor： Tao Mei , Ting Yao , Yong Rui

IPC: H04N21/8405 , G06K9/00 , G06N3/08 , G06F17/27

CPC classification number: H04N21/8405 , G06F17/274 , G06F17/2785 , G06K9/00718 , G06K9/6273 , G06N3/08 , H04N21/26603

Abstract: Video description generation using neural network training based on relevance and coherence is described. In some examples, long short-term memory with visual-semantic embedding (LSTM-E) can maximize the probability of generating the next word given previous words and visual content and can create a visual-semantic embedding space for enforcing the relationship between the semantics of an entire sentence and visual content. LSTM-E can include a 2-D and/or 3-D deep convolutional neural networks for learning powerful video representation, a deep recurrent neural network for generating sentences, and a joint embedding model for exploring the relationships between visual content and sentence semantics.

Patent Agency Ranking