COMPOSING TEXT AND STRUCTURED DATABASES
    1.
    发明申请
    COMPOSING TEXT AND STRUCTURED DATABASES 有权
    组合文本和结构化数据库

    公开(公告)号:US20130275441A1

    公开(公告)日:2013-10-17

    申请号:US13561085

    申请日:2012-07-30

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30616

    摘要: A framework is provided for composing texts about objects with structured information about these objects, and thus disclosed are methodologies for linking information from at least two data sources—one comprising a plurality of documents comprising text pertaining to at least one object, and one comprising a plurality of structured records comprising at least one characteristic of the at least one object, each characteristic comprising one property name and an associated property value corresponding to the property name for the at least one object—by determining one or more instance-based traits for each object in both data sources and associating at least one record with at least one document that refers to each object, each trait comprising one or more characteristics that identifiably distinguish each object from all other objects.

    摘要翻译: 提供了一种用于组合关于具有关于这些对象的结构化信息的对象的文本的框架,并且因此公开了用于链接来自至少两个数据源的信息的方法 - 一个包括多个文档的文档,所述文档包括属于至少一个对象的文本, 多个结构化记录,其包括至少一个对象的至少一个特征,每个特征包括一个属性名称和与该至少一个对象的属性名称相对应的关联属性值 - 通过确定每个对象的一个​​或多个基于实例的特征 对象在两个数据源中,并且将至少一个记录与至少一个引用每个对象的文档相关联,每个特征包括一个或多个可识别地将每个对象与所有其他对象区分开的特征。

    Composing text and structured databases
    2.
    发明授权
    Composing text and structured databases 有权
    撰写文本和结构化数据库

    公开(公告)号:US08996539B2

    公开(公告)日:2015-03-31

    申请号:US13561085

    申请日:2012-07-30

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30616

    摘要: A framework is provided for composing texts about objects with structured information about these objects, and thus disclosed are methodologies for linking information from at least two data sources—one comprising a plurality of documents comprising text pertaining to at least one object, and one comprising a plurality of structured records comprising at least one characteristic of the at least one object, each characteristic comprising one property name and an associated property value corresponding to the property name for the at least one object—by determining one or more instance-based traits for each object in both data sources and associating at least one record with at least one document that refers to each object, each trait comprising one or more characteristics that identifiably distinguish each object from all other objects.

    摘要翻译: 提供了一种用于组合关于具有关于这些对象的结构化信息的对象的文本的框架,并且因此公开了用于链接来自至少两个数据源的信息的方法 - 一个包括多个文档的文档,所述文档包括属于至少一个对象的文本, 多个结构化记录,其包括至少一个对象的至少一个特征,每个特征包括一个属性名称和与该至少一个对象的属性名称相对应的关联属性值 - 通过确定每个对象的一个​​或多个基于实例的特征 对象在两个数据源中,并且将至少一个记录与至少一个引用每个对象的文档相关联,每个特征包括一个或多个可识别地将每个对象与所有其他对象区分开的特征。

    Labeling data samples using objective questions
    3.
    发明授权
    Labeling data samples using objective questions 有权
    使用客观问题标注数据样本

    公开(公告)号:US08788498B2

    公开(公告)日:2014-07-22

    申请号:US12484255

    申请日:2009-06-15

    IPC分类号: G06F17/30

    CPC分类号: G06Q10/10

    摘要: Described is a technology for obtaining labeled sample data. Labeling guidelines are converted into binary yes/no questions regarding data samples. The questions and data samples are provided to judges who then answer the questions for each sample. The answers are input to a label assignment algorithm that associates a label with each sample based upon the answers. If the guidelines are modified and previous answers to the binary questions are maintained, at least some of the previous answers may be used in re-labeling the samples in view of the modification.

    摘要翻译: 描述了用于获得标记的样本数据的技术。 标签指南被转换为关于数据样本的二进制是/否问题。 问题和数据样本提供给那些随后回答每个样本的问题的法官。 将答案输入到标签分配算法,该算法根据答案将标签与每个样本相关联。 如果修改了指南并维护了二进制问题的以前的答案,则鉴于修改,至少可以使用一些以前的答案来重新标记样本。

    LABELING DATA SAMPLES USING OBJECTIVE QUESTIONS
    4.
    发明申请
    LABELING DATA SAMPLES USING OBJECTIVE QUESTIONS 有权
    使用目标问题标记数据样本

    公开(公告)号:US20100318539A1

    公开(公告)日:2010-12-16

    申请号:US12484255

    申请日:2009-06-15

    IPC分类号: G06F17/30

    CPC分类号: G06Q10/10

    摘要: Described is a technology for obtaining labeled sample data. Labeling guidelines are converted into binary yes/no questions regarding data samples. The questions and data samples are provided to judges who then answer the questions for each sample. The answers are input to a label assignment algorithm that associates a label with each sample based upon the answers. If the guidelines are modified and previous answers to the binary questions are maintained, at least some of the previous answers may be used in re-labeling the samples in view of the modification.

    摘要翻译: 描述了用于获得标记的样本数据的技术。 标签指南被转换为关于数据样本的二进制是/否问题。 问题和数据样本提供给那些随后回答每个样本的问题的法官。 将答案输入到标签分配算法,该算法根据答案将标签与每个样本相关联。 如果修改了指南并维护了二进制问题的以前的答案,则鉴于修改,至少可以使用一些以前的答案来重新标记样本。

    MODELING ACTIONS FOR ENTITY-CENTRIC SEARCH
    5.
    发明申请
    MODELING ACTIONS FOR ENTITY-CENTRIC SEARCH 有权
    建立实体中央搜索行动

    公开(公告)号:US20130144854A1

    公开(公告)日:2013-06-06

    申请号:US13311581

    申请日:2011-12-06

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30867

    摘要: In one embodiment, a web service engine server 104 may predict a successive action by a user based on an entity reference 302. The web service engine server 104 may identify an entity reference 302 in a data transmission caused by a user. The web service engine server 104 may determine from the data transmission a user intention towards the entity reference 302 using an intention model based on a transmission log. The web service engine server 104 may predict a related successive web action option 522 for the entity reference 302 based on the user intention.

    摘要翻译: 在一个实施例中,网络服务引擎服务器104可以基于实体参考302来预测用户的连续动作。网络服务引擎服务器104可以在由用户引起的数据传输中识别实体参考302。 Web服务引擎服务器104可以使用基于传输日志的意图模型从数据传输确定用户对实体参考302的意图。 Web服务引擎服务器104可以基于用户意图预测实体参考302的相关的连续web动作选项522。

    ACCURATE TEXT CLASSIFICATION THROUGH SELECTIVE USE OF IMAGE DATA
    7.
    发明申请
    ACCURATE TEXT CLASSIFICATION THROUGH SELECTIVE USE OF IMAGE DATA 有权
    通过选择性使用图像数据的精确文本分类

    公开(公告)号:US20120314941A1

    公开(公告)日:2012-12-13

    申请号:US13158484

    申请日:2011-06-13

    IPC分类号: G06K9/62

    摘要: Product images are used in conjunction with textual descriptions to improve classifications of product offerings. By combining cues from both text and image descriptions associated with products, implementations enhance both the precision and recall of product description classifications within the context of web-based commerce search. Several implementations are directed to improving those areas where text-only approaches are most unreliable. For example, several implementations use image signals to complement text classifiers and improve overall product classification in situations where brief textual product descriptions use vocabulary that overlaps with multiple diverse categories. Other implementations are directed to using text and images “training sets” to improve automated classifiers including text-only classifiers. Certain implementations are also directed to learning a number of three-way image classifiers focused only on “confusing categories” of the text signals to improve upon those specific areas where text-only classification is weakest.

    摘要翻译: 产品图像与文本描述结合使用,以改进产品分类。 通过结合来自与产品相关的文本和图像描述的提示,实现在基于网络的商业搜索的上下文中增强了产品描述分类的精度和回收。 几个实现旨在改进那些仅文本方法最不可靠的领域。 例如,在简短的文本产品描述使用与多个不同类别重叠的词汇的情况下,多个实现使用图像信号来补充文本分类器并改进整体产品分类。 其他实现涉及使用文本和图像训练集来改进自动分类器,包括纯文本分类器。 某些实现也针对学习一些三维图像分类器,仅针对混淆文本信号的类别,以改进文本分类最弱的特定区域。

    Accurate text classification through selective use of image data
    8.
    发明授权
    Accurate text classification through selective use of image data 有权
    通过选择性使用图像数据来准确地进行文本分类

    公开(公告)号:US08768050B2

    公开(公告)日:2014-07-01

    申请号:US13158484

    申请日:2011-06-13

    IPC分类号: G06K9/62

    摘要: Product images are used in conjunction with textual descriptions to improve classifications of product offerings. By combining cues from both text and image descriptions associated with products, implementations enhance both the precision and recall of product description classifications within the context of web-based commerce search. Several implementations are directed to improving those areas where text-only approaches are most unreliable. For example, several implementations use image signals to complement text classifiers and improve overall product classification in situations where brief textual product descriptions use vocabulary that overlaps with multiple diverse categories. Other implementations are directed to using text and images “training sets” to improve automated classifiers including text-only classifiers. Certain implementations are also directed to learning a number of three-way image classifiers focused only on “confusing categories” of the text signals to improve upon those specific areas where text-only classification is weakest.

    摘要翻译: 产品图像与文本描述结合使用,以改进产品分类。 通过结合来自与产品相关的文本和图像描述的提示,实现在基于网络的商业搜索的上下文中增强了产品描述分类的精度和回收。 几个实现旨在改进那些仅文本方法最不可靠的领域。 例如,在简短的文本产品描述使用与多个不同类别重叠的词汇的情况下,多个实现使用图像信号来补充文本分类器并改进整体产品分类。 其他实现涉及使用文本和图像“训练集”来改进自动分类器,包括纯文本分类器。 某些实现也针对学习一些三维图像分类器,仅针对文本信号的“混淆类别”,以改善文本分类最弱的特定领域。

    Query classification using implicit labels
    9.
    发明授权
    Query classification using implicit labels 有权
    使用隐式标签的查询分类

    公开(公告)号:US08423568B2

    公开(公告)日:2013-04-16

    申请号:US12560427

    申请日:2009-09-16

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30693

    摘要: Described is a technology for automatically generating labeled training data for training a classifier based upon implicit information associated with the data. For example, whether a query has commercial intent can be classified based upon whether the query was submitted at a commercial website's search portal, as logged in a toolbar log. Positive candidate query-related data is extracted from the toolbar log based upon the associated implicit information. A click log is processed to obtain negative query-related data. The labeled training data is automatically generated by separating at least some of the positive candidate query data from the remaining positive candidate query data based upon the negative query data. The labeled training data may be used to train a classifier, such as to classify an online search query as having a certain type of intent or not.

    摘要翻译: 描述了一种用于根据与数据相关联的隐含信息自动生成用于训练分类器的标记训练数据的技术。 例如,查询是否具有商业意图可以根据在商业网站的搜索门户网站上提交的查询进行分类,如登录在工具栏日志中。 基于相关联的隐含信息,从工具栏日志中提取正候选查询相关数据。 处理点击日志以获取负查询相关数据。 基于负查询数据,将剩余的正候选查询数据中的至少一些正候选查询数据分离出来,自动生成标示训练数据。 标记的训练数据可以用于训练分类器,例如将在线搜索查询分类为具有某种类型的意图。

    QUERY CLASSIFICATION USING IMPLICIT LABELS
    10.
    发明申请
    QUERY CLASSIFICATION USING IMPLICIT LABELS 有权
    使用隐含标签的查询分类

    公开(公告)号:US20110066650A1

    公开(公告)日:2011-03-17

    申请号:US12560427

    申请日:2009-09-16

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30693

    摘要: Described is a technology for automatically generating labeled training data for training a classifier based upon implicit information associated with the data. For example, whether a query has commercial intent can be classified based upon whether the query was submitted at a commercial website's search portal, as logged in a toolbar log. Positive candidate query-related data is extracted from the toolbar log based upon the associated implicit information. A click log is processed to obtain negative query-related data. The labeled training data is automatically generated by separating at least some of the positive candidate query data from the remaining positive candidate query data based upon the negative query data. The labeled training data may be used to train a classifier, such as to classify an online search query as having a certain type of intent or not.

    摘要翻译: 描述了一种用于根据与数据相关联的隐含信息自动生成用于训练分类器的标记训练数据的技术。 例如,查询是否具有商业意图可以根据在商业网站的搜索门户网站上提交的查询进行分类,如登录在工具栏日志中。 基于相关联的隐含信息,从工具栏日志中提取正候选查询相关数据。 处理点击日志以获取负查询相关数据。 基于负查询数据,将剩余的正候选查询数据中的至少一些正候选查询数据分离出来,自动生成标示训练数据。 标记的训练数据可以用于训练分类器,例如将在线搜索查询分类为具有某种类型的意图。