Accurate text classification through selective use of image data
    45.
    发明授权
    Accurate text classification through selective use of image data 有权
    通过选择性使用图像数据来准确地进行文本分类

    公开(公告)号:US08768050B2

    公开(公告)日:2014-07-01

    申请号:US13158484

    申请日:2011-06-13

    IPC分类号: G06K9/62

    摘要: Product images are used in conjunction with textual descriptions to improve classifications of product offerings. By combining cues from both text and image descriptions associated with products, implementations enhance both the precision and recall of product description classifications within the context of web-based commerce search. Several implementations are directed to improving those areas where text-only approaches are most unreliable. For example, several implementations use image signals to complement text classifiers and improve overall product classification in situations where brief textual product descriptions use vocabulary that overlaps with multiple diverse categories. Other implementations are directed to using text and images “training sets” to improve automated classifiers including text-only classifiers. Certain implementations are also directed to learning a number of three-way image classifiers focused only on “confusing categories” of the text signals to improve upon those specific areas where text-only classification is weakest.

    摘要翻译: 产品图像与文本描述结合使用,以改进产品分类。 通过结合来自与产品相关的文本和图像描述的提示,实现在基于网络的商业搜索的上下文中增强了产品描述分类的精度和回收。 几个实现旨在改进那些仅文本方法最不可靠的领域。 例如,在简短的文本产品描述使用与多个不同类别重叠的词汇的情况下,多个实现使用图像信号来补充文本分类器并改进整体产品分类。 其他实现涉及使用文本和图像“训练集”来改进自动分类器,包括纯文本分类器。 某些实现也针对学习一些三维图像分类器,仅针对文本信号的“混淆类别”,以改善文本分类最弱的特定领域。

    Visually-represented results to search queries in rich media content
    46.
    发明授权
    Visually-represented results to search queries in rich media content 有权
    在富媒体内容中搜索查询的视觉化结果

    公开(公告)号:US08751502B2

    公开(公告)日:2014-06-10

    申请号:US11321044

    申请日:2005-12-30

    申请人: Rakesh Agrawal

    发明人: Rakesh Agrawal

    IPC分类号: G06F7/00 G06F17/30

    摘要: When executed, a computer program product generates a graphical user interface that renders results that are responsive to a search query of a rich media file. The graphical user interface includes a chronological representation of the rich media file, one or more occurrence markers along the chronological representation corresponding to actual occurrences of a desired term at an indicated chronological location in the rich media file, and an execution icon configured to launch a rich media application that renders a relevant portion that is responsive to the search query.

    摘要翻译: 当执行时,计算机程序产品生成图形用户界面,其呈现响应于富媒体文件的搜索查询的结果。 图形用户界面包括富媒体文件的时间顺序表示,沿着时间表示的一个或多个出现标记,其对应于在富媒体文件中指示的时间顺序位置处的期望术语的实际出现,以及被配置为发送 富媒体应用程序,呈现响应搜索查询的相关部分。

    COMPOSING TEXT AND STRUCTURED DATABASES
    47.
    发明申请
    COMPOSING TEXT AND STRUCTURED DATABASES 有权
    组合文本和结构化数据库

    公开(公告)号:US20130275441A1

    公开(公告)日:2013-10-17

    申请号:US13561085

    申请日:2012-07-30

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30616

    摘要: A framework is provided for composing texts about objects with structured information about these objects, and thus disclosed are methodologies for linking information from at least two data sources—one comprising a plurality of documents comprising text pertaining to at least one object, and one comprising a plurality of structured records comprising at least one characteristic of the at least one object, each characteristic comprising one property name and an associated property value corresponding to the property name for the at least one object—by determining one or more instance-based traits for each object in both data sources and associating at least one record with at least one document that refers to each object, each trait comprising one or more characteristics that identifiably distinguish each object from all other objects.

    摘要翻译: 提供了一种用于组合关于具有关于这些对象的结构化信息的对象的文本的框架,并且因此公开了用于链接来自至少两个数据源的信息的方法 - 一个包括多个文档的文档,所述文档包括属于至少一个对象的文本, 多个结构化记录,其包括至少一个对象的至少一个特征,每个特征包括一个属性名称和与该至少一个对象的属性名称相对应的关联属性值 - 通过确定每个对象的一个​​或多个基于实例的特征 对象在两个数据源中,并且将至少一个记录与至少一个引用每个对象的文档相关联,每个特征包括一个或多个可识别地将每个对象与所有其他对象区分开的特征。

    Query classification using implicit labels
    48.
    发明授权
    Query classification using implicit labels 有权
    使用隐式标签的查询分类

    公开(公告)号:US08423568B2

    公开(公告)日:2013-04-16

    申请号:US12560427

    申请日:2009-09-16

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30693

    摘要: Described is a technology for automatically generating labeled training data for training a classifier based upon implicit information associated with the data. For example, whether a query has commercial intent can be classified based upon whether the query was submitted at a commercial website's search portal, as logged in a toolbar log. Positive candidate query-related data is extracted from the toolbar log based upon the associated implicit information. A click log is processed to obtain negative query-related data. The labeled training data is automatically generated by separating at least some of the positive candidate query data from the remaining positive candidate query data based upon the negative query data. The labeled training data may be used to train a classifier, such as to classify an online search query as having a certain type of intent or not.

    摘要翻译: 描述了一种用于根据与数据相关联的隐含信息自动生成用于训练分类器的标记训练数据的技术。 例如,查询是否具有商业意图可以根据在商业网站的搜索门户网站上提交的查询进行分类,如登录在工具栏日志中。 基于相关联的隐含信息,从工具栏日志中提取正候选查询相关数据。 处理点击日志以获取负查询相关数据。 基于负查询数据,将剩余的正候选查询数据中的至少一些正候选查询数据分离出来,自动生成标示训练数据。 标记的训练数据可以用于训练分类器,例如将在线搜索查询分类为具有某种类型的意图。

    Providing time-sensitive information for purchase determinations
    49.
    发明授权
    Providing time-sensitive information for purchase determinations 有权
    为购买确定提供时间敏感信息

    公开(公告)号:US08401927B2

    公开(公告)日:2013-03-19

    申请号:US13296982

    申请日:2011-11-15

    摘要: A method, system, and medium are provided that are directed to providing a user with time-sensitive information that is usable to determine when to purchase a product. In accordance with embodiments of the technology, exemplary steps include using historical product information to generate time-sensitive information. Moreover, in response to receiving from a user a request to receive information describing a given product, time-sensitive information is caused to be presented. For example, time-sensitive information might be usable by the user to determine when to purchase the given product and an alternative product.

    摘要翻译: 提供了一种方法,系统和介质,其旨在向用户提供可用于确定何时购买产品的时间敏感信息。 根据技术的实施例,示例性步骤包括使用历史产品信息来生成时间敏感信息。 此外,响应于从用户接收到接收描述给定产品的信息的请求,引起时间敏感信息。 例如,用户可以使用时间敏感信息来确定何时购买给定的产品和替代产品。

    Product synthesis from multiple sources
    50.
    发明授权
    Product synthesis from multiple sources 有权
    从多个来源的产品综合

    公开(公告)号:US08352473B2

    公开(公告)日:2013-01-08

    申请号:US12764676

    申请日:2010-04-21

    IPC分类号: G06Q10/00 G06Q30/00

    摘要: Methods and systems for automatically synthesizing product information from multiple data sources into an on-line catalog are disclosed, and in particular, for automatically synthesizing the product information based on attribute-value pairs. Information for a product may be obtained, via entity extraction, feed ingestion, and other mechanisms, from a plurality of structured and unstructured data sources having different taxonomies and schemas. Product information may additionally or alternatively be obtained or derived based on popularity data. The product information may be cleansed, segmented and normalized. The product information may be clustered so closest products, attribute names and attribute values are associated. A representative value for an attribute name may be determined, and the on-line catalog may be updated so that entries are comprehensive, meaningful and useful to a catalog user. Updates from at least 500 million different data sources may be scheduled to occur as frequently as several times daily.

    摘要翻译: 公开了用于将产品信息从多个数据源自动合成到在线目录中的方法和系统,特别地,用于基于属性值对自动合成产品信息。 可以通过实体提取,饲料摄取和其他机制从具有不同分类和模式的多个结构化和非结构化数据源获得信息。 产品信息可以另外地或替代地基于流行度数据获得或导出。 产品信息可以被清洁,分段和归一化。 产品信息可能被聚集,因此最接近的产品,属性名称和属性值相关联。 可以确定属性名称的代表值,并且可以更新在线目录,使得条目对目录用户是全面的,有意义的和有用的。 可能会安排从至少5亿个不同数据源进行更新,频繁发生,每天多次。