COMPOSING TEXT AND STRUCTURED DATABASES
    1.
    发明申请
    COMPOSING TEXT AND STRUCTURED DATABASES 有权
    组合文本和结构化数据库

    公开(公告)号:US20130275441A1

    公开(公告)日:2013-10-17

    申请号:US13561085

    申请日:2012-07-30

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30616

    摘要: A framework is provided for composing texts about objects with structured information about these objects, and thus disclosed are methodologies for linking information from at least two data sources—one comprising a plurality of documents comprising text pertaining to at least one object, and one comprising a plurality of structured records comprising at least one characteristic of the at least one object, each characteristic comprising one property name and an associated property value corresponding to the property name for the at least one object—by determining one or more instance-based traits for each object in both data sources and associating at least one record with at least one document that refers to each object, each trait comprising one or more characteristics that identifiably distinguish each object from all other objects.

    摘要翻译: 提供了一种用于组合关于具有关于这些对象的结构化信息的对象的文本的框架,并且因此公开了用于链接来自至少两个数据源的信息的方法 - 一个包括多个文档的文档,所述文档包括属于至少一个对象的文本, 多个结构化记录,其包括至少一个对象的至少一个特征,每个特征包括一个属性名称和与该至少一个对象的属性名称相对应的关联属性值 - 通过确定每个对象的一个​​或多个基于实例的特征 对象在两个数据源中,并且将至少一个记录与至少一个引用每个对象的文档相关联,每个特征包括一个或多个可识别地将每个对象与所有其他对象区分开的特征。

    Composing text and structured databases
    2.
    发明授权
    Composing text and structured databases 有权
    撰写文本和结构化数据库

    公开(公告)号:US08996539B2

    公开(公告)日:2015-03-31

    申请号:US13561085

    申请日:2012-07-30

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30616

    摘要: A framework is provided for composing texts about objects with structured information about these objects, and thus disclosed are methodologies for linking information from at least two data sources—one comprising a plurality of documents comprising text pertaining to at least one object, and one comprising a plurality of structured records comprising at least one characteristic of the at least one object, each characteristic comprising one property name and an associated property value corresponding to the property name for the at least one object—by determining one or more instance-based traits for each object in both data sources and associating at least one record with at least one document that refers to each object, each trait comprising one or more characteristics that identifiably distinguish each object from all other objects.

    摘要翻译: 提供了一种用于组合关于具有关于这些对象的结构化信息的对象的文本的框架,并且因此公开了用于链接来自至少两个数据源的信息的方法 - 一个包括多个文档的文档,所述文档包括属于至少一个对象的文本, 多个结构化记录,其包括至少一个对象的至少一个特征,每个特征包括一个属性名称和与该至少一个对象的属性名称相对应的关联属性值 - 通过确定每个对象的一个​​或多个基于实例的特征 对象在两个数据源中,并且将至少一个记录与至少一个引用每个对象的文档相关联,每个特征包括一个或多个可识别地将每个对象与所有其他对象区分开的特征。

    Labeling data samples using objective questions
    3.
    发明授权
    Labeling data samples using objective questions 有权
    使用客观问题标注数据样本

    公开(公告)号:US08788498B2

    公开(公告)日:2014-07-22

    申请号:US12484255

    申请日:2009-06-15

    IPC分类号: G06F17/30

    CPC分类号: G06Q10/10

    摘要: Described is a technology for obtaining labeled sample data. Labeling guidelines are converted into binary yes/no questions regarding data samples. The questions and data samples are provided to judges who then answer the questions for each sample. The answers are input to a label assignment algorithm that associates a label with each sample based upon the answers. If the guidelines are modified and previous answers to the binary questions are maintained, at least some of the previous answers may be used in re-labeling the samples in view of the modification.

    摘要翻译: 描述了用于获得标记的样本数据的技术。 标签指南被转换为关于数据样本的二进制是/否问题。 问题和数据样本提供给那些随后回答每个样本的问题的法官。 将答案输入到标签分配算法,该算法根据答案将标签与每个样本相关联。 如果修改了指南并维护了二进制问题的以前的答案,则鉴于修改,至少可以使用一些以前的答案来重新标记样本。

    LABELING DATA SAMPLES USING OBJECTIVE QUESTIONS
    4.
    发明申请
    LABELING DATA SAMPLES USING OBJECTIVE QUESTIONS 有权
    使用目标问题标记数据样本

    公开(公告)号:US20100318539A1

    公开(公告)日:2010-12-16

    申请号:US12484255

    申请日:2009-06-15

    IPC分类号: G06F17/30

    CPC分类号: G06Q10/10

    摘要: Described is a technology for obtaining labeled sample data. Labeling guidelines are converted into binary yes/no questions regarding data samples. The questions and data samples are provided to judges who then answer the questions for each sample. The answers are input to a label assignment algorithm that associates a label with each sample based upon the answers. If the guidelines are modified and previous answers to the binary questions are maintained, at least some of the previous answers may be used in re-labeling the samples in view of the modification.

    摘要翻译: 描述了用于获得标记的样本数据的技术。 标签指南被转换为关于数据样本的二进制是/否问题。 问题和数据样本提供给那些随后回答每个样本的问题的法官。 将答案输入到标签分配算法,该算法根据答案将标签与每个样本相关联。 如果修改了指南并维护了二进制问题的以前的答案,则鉴于修改,至少可以使用一些以前的答案来重新标记样本。

    MODELING ACTIONS FOR ENTITY-CENTRIC SEARCH
    5.
    发明申请
    MODELING ACTIONS FOR ENTITY-CENTRIC SEARCH 有权
    建立实体中央搜索行动

    公开(公告)号:US20130144854A1

    公开(公告)日:2013-06-06

    申请号:US13311581

    申请日:2011-12-06

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30867

    摘要: In one embodiment, a web service engine server 104 may predict a successive action by a user based on an entity reference 302. The web service engine server 104 may identify an entity reference 302 in a data transmission caused by a user. The web service engine server 104 may determine from the data transmission a user intention towards the entity reference 302 using an intention model based on a transmission log. The web service engine server 104 may predict a related successive web action option 522 for the entity reference 302 based on the user intention.

    摘要翻译: 在一个实施例中,网络服务引擎服务器104可以基于实体参考302来预测用户的连续动作。网络服务引擎服务器104可以在由用户引起的数据传输中识别实体参考302。 Web服务引擎服务器104可以使用基于传输日志的意图模型从数据传输确定用户对实体参考302的意图。 Web服务引擎服务器104可以基于用户意图预测实体参考302的相关的连续web动作选项522。

    PRODUCT SYNTHESIS FROM MULTIPLE SOURCES
    7.
    发明申请
    PRODUCT SYNTHESIS FROM MULTIPLE SOURCES 有权
    多源产品合成

    公开(公告)号:US20110264598A1

    公开(公告)日:2011-10-27

    申请号:US12764676

    申请日:2010-04-21

    IPC分类号: G06Q10/00 G06Q30/00

    摘要: Methods and systems for automatically synthesizing product information from multiple data sources into an on-line catalog are disclosed, and in particular, for automatically synthesizing the product information based on attribute-value pairs. Information for a product may be obtained, via entity extraction, feed ingestion, and other mechanisms, from a plurality of structured and unstructured data sources having different taxonomies and schemas. Product information may additionally or alternatively be obtained or derived based on popularity data. The product information may be cleansed, segmented and normalized. The product information may be clustered so closest products, attribute names and attribute values are associated. A representative value for an attribute name may be determined, and the on-line catalog may be updated so that entries are comprehensive, meaningful and useful to a catalog user. Updates from at least 500 million different data sources may be scheduled to occur as frequently as several times daily.

    摘要翻译: 公开了用于将产品信息从多个数据源自动合成到在线目录中的方法和系统,特别地,用于基于属性值对自动合成产品信息。 可以通过实体提取,饲料摄取和其他机制从具有不同分类和模式的多个结构化和非结构化数据源获得信息。 产品信息可以另外地或替代地基于流行度数据获得或导出。 产品信息可以被清洁,分段和归一化。 产品信息可能被聚集,因此最接近的产品,属性名称和属性值相关联。 可以确定属性名称的代表值,并且可以更新在线目录,使得条目对目录用户是全面的,有意义的和有用的。 可能会安排从至少5亿个不同数据源进行更新,频繁发生,每天多次。

    Query classification based on query click logs
    8.
    发明授权
    Query classification based on query click logs 有权
    基于查询点击日志的查询分类

    公开(公告)号:US07877404B2

    公开(公告)日:2011-01-25

    申请号:US12042531

    申请日:2008-03-05

    IPC分类号: G06F17/30

    摘要: Methods are provided for the classification of search engine queries and associated documents based on search engine query click logs. One or more seed documents or queries are provided that contain content that is representative of a category. A query click log containing information regarding queries entered by at least one user into the search engine and documents subsequently clicked in search engine results corresponding with the queries is analyzed to determine which one or more queries resulted in clicks on the seed documents. Information is stored associating the one or more queries with the category if they resulted in clicks on the seed documents.

    摘要翻译: 提供了基于搜索引擎查询点击日志对搜索引擎查询和相关文档进行分类的方法。 提供了一个或多个种子文档或查询,其中包含代表类别的内容。 包含关于由至少一个用户输入到搜索引擎中的查询的信息的查询点击日志,并分析随后在查询中对应的搜索引擎结果中点击的文档,以确定哪一个或多个查询导致种子文档的点击。 存储将一个或多个查询与类别相关联的信息,如果它们导致种子文档的点击。

    Providing time-sensitive information for purchase determinations
    9.
    发明授权
    Providing time-sensitive information for purchase determinations 有权
    为购买确定提供时间敏感信息

    公开(公告)号:US08401927B2

    公开(公告)日:2013-03-19

    申请号:US13296982

    申请日:2011-11-15

    摘要: A method, system, and medium are provided that are directed to providing a user with time-sensitive information that is usable to determine when to purchase a product. In accordance with embodiments of the technology, exemplary steps include using historical product information to generate time-sensitive information. Moreover, in response to receiving from a user a request to receive information describing a given product, time-sensitive information is caused to be presented. For example, time-sensitive information might be usable by the user to determine when to purchase the given product and an alternative product.

    摘要翻译: 提供了一种方法,系统和介质,其旨在向用户提供可用于确定何时购买产品的时间敏感信息。 根据技术的实施例,示例性步骤包括使用历史产品信息来生成时间敏感信息。 此外,响应于从用户接收到接收描述给定产品的信息的请求,引起时间敏感信息。 例如,用户可以使用时间敏感信息来确定何时购买给定的产品和替代产品。

    Product synthesis from multiple sources
    10.
    发明授权
    Product synthesis from multiple sources 有权
    从多个来源的产品综合

    公开(公告)号:US08352473B2

    公开(公告)日:2013-01-08

    申请号:US12764676

    申请日:2010-04-21

    IPC分类号: G06Q10/00 G06Q30/00

    摘要: Methods and systems for automatically synthesizing product information from multiple data sources into an on-line catalog are disclosed, and in particular, for automatically synthesizing the product information based on attribute-value pairs. Information for a product may be obtained, via entity extraction, feed ingestion, and other mechanisms, from a plurality of structured and unstructured data sources having different taxonomies and schemas. Product information may additionally or alternatively be obtained or derived based on popularity data. The product information may be cleansed, segmented and normalized. The product information may be clustered so closest products, attribute names and attribute values are associated. A representative value for an attribute name may be determined, and the on-line catalog may be updated so that entries are comprehensive, meaningful and useful to a catalog user. Updates from at least 500 million different data sources may be scheduled to occur as frequently as several times daily.

    摘要翻译: 公开了用于将产品信息从多个数据源自动合成到在线目录中的方法和系统,特别地,用于基于属性值对自动合成产品信息。 可以通过实体提取,饲料摄取和其他机制从具有不同分类和模式的多个结构化和非结构化数据源获得信息。 产品信息可以另外地或替代地基于流行度数据获得或导出。 产品信息可以被清洁,分段和归一化。 产品信息可能被聚集,因此最接近的产品,属性名称和属性值相关联。 可以确定属性名称的代表值,并且可以更新在线目录,使得条目对目录用户是全面的,有意义的和有用的。 可能会安排从至少5亿个不同数据源进行更新,频繁发生,每天多次。