- Patent Title: Learning multimedia semantics from large-scale unstructured data
-
Application No.: US14266228Application Date: 2014-04-30
-
Publication No.: US09875301B2Publication Date: 2018-01-23
- Inventor: Xian-Sheng Hua , Jin Li , Yoshitaka Ushiku
- Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
- Applicant Address: US WA Redmond
- Assignee: Microsoft Technology Licensing, LLC
- Current Assignee: Microsoft Technology Licensing, LLC
- Current Assignee Address: US WA Redmond
- Agency: Schwegman Lundberg & Woessner, P.A.
- Main IPC: G06N99/00
- IPC: G06N99/00 ; G06F17/30

Abstract:
Systems and methods for learning topic models from unstructured data and applying the learned topic models to recognize semantics for new data items are described herein. In at least one embodiment, a corpus of multimedia data items associated with a set of labels may be processed to generate a refined corpus of multimedia data items associated with the set of labels. Such processing may include arranging the multimedia data items in clusters based on similarities of extracted multimedia features and generating intra-cluster and inter-cluster features. The intra-cluster and the inter-cluster features may be used for removing multimedia data items from the corpus to generate the refined corpus. The refined corpus may be used for training topic models for identifying labels. The resulting models may be stored and subsequently used for identifying semantics of a multimedia data item input by a user.
Public/Granted literature
- US20150317389A1 Learning Multimedia Semantics from Large-Scale Unstructured Data Public/Granted day:2015-11-05
Information query