LEARNING MULTIMEDIA SEMANTICS FROM LARGE-SCALE UNSTRUCTURED DATA
    1.
    发明申请
    LEARNING MULTIMEDIA SEMANTICS FROM LARGE-SCALE UNSTRUCTURED DATA 审中-公开
    从大规模的非结构化数据学习多媒体语义

    公开(公告)号:WO2015167942A1

    公开(公告)日:2015-11-05

    申请号:PCT/US2015/027408

    申请日:2015-04-24

    CPC classification number: G06F17/30705 G06F17/30675 G06F17/30864 G06N99/005

    Abstract: Systems and methods for learning topic models from unstructured data and applying the learned topic models to recognize semantics for new data items are described herein. In at least one embodiment, a corpus of multimedia data items associated with a set of labels may be processed to generate a refined corpus of multimedia data items associated with the set of labels. Such processing may include arranging the multimedia data items in clusters based on similarities of extracted multimedia features and generating intra-cluster and inter-cluster features. The intra-cluster and the inter-cluster features may be used for removing multimedia data items from the corpus to generate the refined corpus. The refined corpus may be used for training topic models for identifying labels. The resulting models may be stored and subsequently used for identifying semantics of a multimedia data item input by a user.

    Abstract translation: 本文描述了用于从非结构化数据学习主题模型并应用所学习的主题模型以识别新数据项的语义的系统和方法。 在至少一个实施例中,可以处理与一组标签相关联的多媒体数据项的语料库,以生成与该组标签相关联的多媒体数据项的精简语料库。 这种处理可以包括基于提取的多媒体特征的相似性来排列多媒体数据项,并且生成集群内和集群间特征。 集群内和集群间特征可用于从语料库中移除多媒体数据项以产生精细语料库。 精致的语料库可用于训练用于识别标签的主题模型。 所得到的模型可以被存储并随后用于识别由用户输入的多媒体数据项的语义。

    OPTIMIZING MULTI-CLASS MULTIMEDIA DATA CLASSIFICATION USING NEGATIVE DATA
    2.
    发明申请
    OPTIMIZING MULTI-CLASS MULTIMEDIA DATA CLASSIFICATION USING NEGATIVE DATA 审中-公开
    使用负数据优化多级多媒体数据分类

    公开(公告)号:WO2016118402A1

    公开(公告)日:2016-07-28

    申请号:PCT/US2016/013497

    申请日:2016-01-15

    CPC classification number: G06K9/66 G06K9/6218 G06K9/6269 G06K9/6284 G06N99/005

    Abstract: Techniques for optimizing multi-class image classification by leveraging negative multimedia data items to train and update classifiers are described. The techniques describe accessing positive multimedia data items of a plurality of multimedia data items, extracting features from the positive multimedia data items, and training classifiers based at least in part on the features. The classifiers may include a plurality of model vectors each corresponding to one of the individual labels. The system may iteratively test the classifiers using positive multimedia data and negative multimedia data and may update one or more model vectors associated with the classifiers differently, depending on whether multimedia data items are positive or negative. Techniques for applying the classifiers to determine whether a new multimedia data item is associated with a topic based at least in part on comparing similarity values with corresponding statistics derived from classifier training are also described.

    Abstract translation: 描述了通过利用负多媒体数据项来训练和更新分类器来优化多类图像分类的技术。 该技术描述了访问多个多媒体数据项中的正多媒体数据项,至少部分地基于特征从正向多媒体数据项提取特征,以及训练分类器。 分类器可以包括多个模型向量,每个模型向量对应于单个标签之一。 系统可以使用正多媒体数据和负多媒体数据迭代地测试分类器,并且可以根据多媒体数据项是正还是负来更新与分类器相关联的一个或多个模型向量。 还描述了至少部分地基于将相似性值与从分类器训练得到的相应统计量进行比较来应用分类器来确定新的多媒体数据项是否与主题相关联的技术。

    OPTIMIZING MULTI-CLASS IMAGE CLASSIFICATION USING PATCH FEATURES
    3.
    发明申请
    OPTIMIZING MULTI-CLASS IMAGE CLASSIFICATION USING PATCH FEATURES 审中-公开
    使用PATCH特性优化多类别图像分类

    公开(公告)号:WO2016118286A1

    公开(公告)日:2016-07-28

    申请号:PCT/US2015/067554

    申请日:2015-12-28

    CPC classification number: G06K9/6227 G06K9/6218 G06K9/623 G06K9/6262

    Abstract: Optimizing multi-class image classification by leveraging patch-based features extracted from weakly supervised images to train classifiers is described. A corpus of images associated with a set of labels may be received. One or more patches may be extracted from individual images in the corpus. Patch-based features may be extracted from the one or more patches and patch representations may be extracted from individual patches of the one or more patches. The patches may be arranged into clusters based at least in part on the patch-based features. At least some of the individual patches may be removed from individual clusters based at least in part on determined similarity values that are representative of similarity between the individual patches. The system may train classifiers based in part on patch-based features extracted from patches in the refined clusters. The classifiers may be used to accurately and efficiently classify new images.

    Abstract translation: 描述了通过利用从弱监督图像提取的基于补丁的特征来训练分类器来优化多类图像分类。 可以接收与一组标签相关联的图像语料库。 可以从语料库中的各个图像中提取一个或多个补丁。 可以从一个或多个补丁中提取基于补丁的特征,并且可以从一个或多个补丁的各个补丁提取补丁表示。 该补丁可以至少部分地基于基于补丁的特征来布置成群集。 可以至少部分地基于代表各个贴片之间的相似性的所确定的相似度值,从单个簇中去除至少一些单个贴片。 该系统可以部分地基于从精简集群中的补丁提取的基于补丁的特征来训练分类器。 分类器可用于准确和有效地对新图像进行分类。

Patent Agency Ranking