-
公开(公告)号:US09183436B2
公开(公告)日:2015-11-10
申请号:US13959724
申请日:2013-08-05
Applicant: Microsoft Technology Licensing, LLC
Inventor: Simon Baker , Dahua Lin , Anitha Kannan , Qifa Ke
CPC classification number: G06K9/00456 , G06F17/2765 , G06F17/30265
Abstract: Text in web pages or other text documents may be classified based on the images or other objects within the webpage. A system for identifying and classifying text related to an object may identify one or more web pages containing the image or similar images, determine topics from the text of the document, and develop a set of training phrases for a classifier. The classifier may be trained and then used to analyze the text in the documents. The training set may include both positive examples and negative examples of text taken from the set of documents. A positive example may include captions or other elements directly associated with the object, while negative examples may include text taken from the documents, but from a large distance from the object. In some cases, the system may iterate on the classification process to refine the results.
Abstract translation: 可以基于网页内的图像或其他对象来对网页或其他文本文档中的文本进行分类。 用于识别和分类与对象相关的文本的系统可以识别包含图像或类似图像的一个或多个网页,从文档的文本确定主题,并且为分类器开发一组训练短语。 可以对分类器进行训练,然后用于分析文档中的文本。 训练集可能包括从该组文件中获取的文本的正面例子和否定的例子。 正面例子可以包括与对象直接相关联的标题或其他元素,而负面示例可以包括从文档中取出的文本,但是距离对象很远的距离。 在某些情况下,系统可能会对分类过程进行迭代以优化结果。
-
公开(公告)号:US11544588B2
公开(公告)日:2023-01-03
申请号:US16374551
申请日:2019-04-03
Applicant: Microsoft Technology Licensing, LLC
Inventor: Simon John Baker , Ashish Kapoor , Gang Hua , Dahua Lin
Abstract: A method described herein includes receiving a digital image, wherein the digital image includes a first element that corresponds to a first domain and a second element that corresponds to a second domain. The method also includes automatically assigning a label to the first element in the digital image based at least in part upon a computed probability that the label corresponds to the first element, wherein the probability is computed through utilization of a first model that is configured to infer labels for elements in the first domain and a second model that is configured to infer labels for elements in the second domain. The first model receives data that identifies learned relationships between elements in the first domain and elements in the second domain, and the probability is computed by the first model based at least in part upon the learned relationships.
-