Matching reviews to objects using a language model
    1.
    发明授权
    Matching reviews to objects using a language model 有权
    使用语言模型将评论与对象进行匹配

    公开(公告)号:US08180755B2

    公开(公告)日:2012-05-15

    申请号:US12554401

    申请日:2009-09-04

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30707

    摘要: A method is provided to associate reviews that have unknown correspondences to tangible entities to structured objects that have known correspondences to tangible entities comprising: transforming a respective review and text from a respective structured object to a collection of words that intersect the respective review and text from the respective structured object; determining a measure of a likelihood of a match as a function of respective probabilities of occurrences of respective words of such intersecting collection within generic review text and respective probabilities of occurrences of respective words of such intersecting collection within structured object text.

    摘要翻译: 提供了一种方法,将具有未知对应的评论与有形实体相关联到具有对有形实体的已知对应关系的结构化对象,包括:将相应的评论和文本从相应的结构化对象变换为与相应评论和文本相交的单词的集合 相应的结构化对象; 确定匹配的可能性的度量作为在通用审阅文本内的这种相交集合的相应单词的出现的相应概率的函数以及在结构化对象文本内的相交集合的相应单词的出现的相应概率的函数。

    Translation model and method for matching reviews to objects
    2.
    发明授权
    Translation model and method for matching reviews to objects 有权
    用于将评论与对象进行匹配的翻译模型和方法

    公开(公告)号:US08972436B2

    公开(公告)日:2015-03-03

    申请号:US12607938

    申请日:2009-10-28

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30705

    摘要: Disclosed are methods and apparatus for matching sets of text to objects are disclosed. In accordance with one embodiment, a set of text is obtained. For instance, the set of text may include a review. A numerical value is determined for each of a plurality of objects, where the numerical value indicates a likelihood that the corresponding one of the plurality of objects is a subject of the set of text. Each of the plurality of objects has an object type defined by a set of one or more attributes, each of the set of one or more attributes having associated therewith a corresponding set of one or more parameters, wherein the numerical value is determined using the set of text and a value of each of the set of one or more parameters for each of the set of one or more attributes. One of the plurality of objects that is most likely to be the subject of the set of text is identified based upon the numerical value that has been determined for each of the plurality of objects.

    摘要翻译: 公开了用于将文本与对象相匹配的方法和装置。 根据一个实施例,获得一组文本。 例如,文本集可能包括一个审查。 为多个对象中的每一个确定数值,其中数值表示多个对象中的对应的一个对象是该组文本的对象的可能性。 所述多个对象中的每一个具有由一组或多个属性定义的对象类型,所述一组或多个属性中的每一个具有与其相关联的一组或多个参数,其中使用所述集合来确定所述数值 的文本以及一组或多个属性中的每一个的一个或多个参数的集合中的每一个的值。 基于为多个对象中的每一个确定的数值来识别最有可能成为该组文本的对象的多个对象中的一个。

    Translation Model and Method for Matching Reviews to Objects
    3.
    发明申请
    Translation Model and Method for Matching Reviews to Objects 有权
    用于匹配对象的评论的翻译模型和方法

    公开(公告)号:US20110099192A1

    公开(公告)日:2011-04-28

    申请号:US12607938

    申请日:2009-10-28

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30705

    摘要: Disclosed are methods and apparatus for matching sets of text to objects are disclosed. In accordance with one embodiment, a set of text is obtained. For instance, the set of text may include a review. A numerical value is determined for each of a plurality of objects, where the numerical value indicates a likelihood that the corresponding one of the plurality of objects is a subject of the set of text. Each of the plurality of objects has an object type defined by a set of one or more attributes, each of the set of one or more attributes having associated therewith a corresponding set of one or more parameters, wherein the numerical value is determined using the set of text and a value of each of the set of one or more parameters for each of the set of one or more attributes. One of the plurality of objects that is most likely to be the subject of the set of text is identified based upon the numerical value that has been determined for each of the plurality of objects.

    摘要翻译: 公开了用于将文本与对象相匹配的方法和装置。 根据一个实施例,获得一组文本。 例如,文本集可能包括一个审查。 为多个对象中的每一个确定数值,其中数值表示多个对象中的对应的一个对象是该组文本的对象的可能性。 所述多个对象中的每一个具有由一组或多个属性定义的对象类型,所述一组或多个属性中的每一个具有与其相关联的一组或多个参数,其中使用所述集合来确定所述数值 的文本以及一组或多个属性中的每一个的一个或多个参数的集合中的每一个的值。 基于为多个对象中的每一个确定的数值来识别最有可能成为该组文本的对象的多个对象中的一个。

    Matching items of user-generated content to entities
    4.
    发明授权
    Matching items of user-generated content to entities 有权
    将用户生成的内容的项目与实体相匹配

    公开(公告)号:US08412771B2

    公开(公告)日:2013-04-02

    申请号:US12909766

    申请日:2010-10-21

    IPC分类号: G06F15/16

    摘要: A method, apparatus, and computer-readable medium are provided for matching items of user-generated content to entities is provided. Items of user-generated content, such as status updates, are gathered. For each of the items, a machine determines a degree to which the item is associated with an entity. In one aspect, items are matched to an entity by matching the content of the items to attributes of the entity. In another aspect, items are matched to an entity by predicting attributes of an author of the items and determining a distance between the predicted attributes of the author and the attributes of the entity. The distance may be a physical distance between locations of the entity and user or a contextual distance between categories for the entity and posts by the author. Items matched to the entity may be displayed on an interface concurrently with information about the entity.

    摘要翻译: 提供了一种用于将用户生成的内容与实体相匹配的方法,装置和计算机可读介质。 收集用户生成内容的项目,如状态更新。 对于每个项目,机器确定项目与实体相关联的程度。 在一个方面,通过将项目的内容与实体的属性相匹配来将项目与实体相匹配。 在另一方面,通过预测项目的作者的属性并确定作者的预测属性与实体的属性之间的距离来将项目与实体相匹配。 该距离可以是实体和用户的位置之间的物理距离或作者的实体和帖子的类别之间的上下文距离。 与实体匹配的项目可以与接口的实体同时显示。

    MATCHING ITEMS OF USER-GENERATED CONTENT TO ENTITIES
    5.
    发明申请
    MATCHING ITEMS OF USER-GENERATED CONTENT TO ENTITIES 有权
    用户生成内容与实体的匹配项目

    公开(公告)号:US20120102104A1

    公开(公告)日:2012-04-26

    申请号:US12909766

    申请日:2010-10-21

    IPC分类号: G06F15/16 G06F17/30

    摘要: A method, apparatus, and computer-readable medium are provided for matching items of user-generated content to entities is provided. Items of user-generated content, such as status updates, are gathered. For each of the items, a machine determines a degree to which the item is associated with an entity. In one aspect, items are matched to an entity by matching the content of the items to attributes of the entity. In another aspect, items are matched to an entity by predicting attributes of an author of the items and determining a distance between the predicted attributes of the author and the attributes of the entity. The distance may be a physical distance between locations of the entity and user or a contextual distance between categories for the entity and posts by the author. Items matched to the entity may be displayed on an interface concurrently with information about the entity.

    摘要翻译: 提供了一种用于将用户生成的内容与实体相匹配的方法,装置和计算机可读介质。 收集用户生成内容的项目,如状态更新。 对于每个项目,机器确定项目与实体相关联的程度。 在一个方面,通过将项目的内容与实体的属性相匹配来将项目与实体相匹配。 在另一方面,通过预测项目的作者的属性并确定作者的预测属性与实体的属性之间的距离来将项目与实体相匹配。 该距离可以是实体和用户的位置之间的物理距离或作者的实体和帖子的类别之间的上下文距离。 与实体匹配的项目可以与接口的实体同时显示。

    Method and System for Form-Filling Crawl and Associating Rich Keywords
    6.
    发明申请
    Method and System for Form-Filling Crawl and Associating Rich Keywords 有权
    填写查询和关联丰富关键字的方法和系统

    公开(公告)号:US20110087646A1

    公开(公告)日:2011-04-14

    申请号:US12576011

    申请日:2009-10-08

    IPC分类号: G06F7/10 G06F17/30

    CPC分类号: G06F17/30864

    摘要: Techniques are provided for the efficient location, processing, and retrieval of local product information derived from web pages generally locatable through form queries submitted to web pages often referred to as the “deep” or “hidden” web. In an embodiment, information such as product information and dealer-location information is located on a web page form such as a dealer-locator form. After location of a suitable web page form, editorial wrapping is performed to create an automated information extraction process. Using the automated information extractor, deep-web crawling is performed. A grid-based extraction of individual business records is performed, and matching and ingestion are performed in conjunction with a business listing database. Finally, metadata tags are added to entries in the business listing database. Metadata tags also may be added to entries in other databases.

    摘要翻译: 提供技术用于从通常通过提交到通常被称为“深”或“隐藏”网络的网页的表单查询的定位的网页获得的本地产品信息的有效定位,处理和检索。 在一个实施例中,诸如产品信息和经销商位置信息的信息位于诸如经销商定位器形式的网页形式上。 在找到合适的网页表单之后,执行编辑包装以创建自动化信息提取过程。 使用自动信息提取器,执行深度网页抓取。 执行单个业务记录的基于网格的提取,并且与业务列表数据库一起执行匹配和摄取。 最后,元数据标签被添加到业务列表数据库中的条目。 元数据标签也可以添加到其他数据库中的条目。

    Robust cardinality and cost estimation for skyline operator
    7.
    发明申请
    Robust cardinality and cost estimation for skyline operator 有权
    天际线运营商的鲁棒基数和成本估算

    公开(公告)号:US20070198439A1

    公开(公告)日:2007-08-23

    申请号:US11357665

    申请日:2006-02-17

    IPC分类号: G06F17/00

    CPC分类号: G06F17/30469 G06Q30/0283

    摘要: The claimed subject matter relates to incorporating a skyline operator within a relational database engine, and more particularly to a database engine that utilizes novel techniques to determine the lowest cost of generating the skyline produced by the skyline operator. The database engine receives queries and associated preferences and based on a cardinality estimate and a cost estimate an appropriate skyline generating technique is utilized to produce a skyline representative of the received queries and its associated preferences.

    摘要翻译: 所要求保护的主题涉及在关系数据库引擎内并入天际线运算符,更具体地涉及利用新技术来确定由天际线运算符产生的天际线产生的最低成本的数据库引擎。 数据库引擎接收查询和相关联的偏好,并且基于基数估计和成本估计,使用适当的地平线生成技术来产生所接收的查询及其相关联的偏好的天际线代表。

    Robust wrappers for web extraction
    8.
    发明授权
    Robust wrappers for web extraction 有权
    用于网络提取的强大的包装

    公开(公告)号:US08762829B2

    公开(公告)日:2014-06-24

    申请号:US12344076

    申请日:2008-12-24

    IPC分类号: G06F17/22

    摘要: A computer-implemented method to determine a robust wrapper includes developing a model indicative of the temporal history of a document, such as a web document written in a markup language. Based on the developed model, robustness characteristics are determined for a plurality of different wrappers representing associated paths to the data item in a representation of the document. Based on a result of the determining operation, a result wrapper of the plurality of wrappers is provided. The result wrapper has a desired robustness characteristic.

    摘要翻译: 用于确定鲁棒包装器的计算机实现的方法包括开发指示文档的时间历史的模型,诸如以标记语言书写的web文档。 基于所开发的模型,为表示文档的表示中的与数据项的相关联的路径的多个不同的包装器确定鲁棒性特性。 基于确定操作的结果,提供多个包装纸的结果包装纸。 结果包装器具有所需的鲁棒特性。

    Computing probabilistic answers to queries
    9.
    发明申请
    Computing probabilistic answers to queries 失效
    计算查询的概率答案

    公开(公告)号:US20060206477A1

    公开(公告)日:2006-09-14

    申请号:US11281983

    申请日:2005-11-17

    IPC分类号: G06F17/30

    摘要: A system that supports arbitrarily complex SQL queries with “uncertain” predicates. The query semantics are based on a probabilistic model and the results are ranked, much like in Information Retrieval, based upon their probability. An optimization algorithm is employed that can efficiently compute most queries. The algorithm attempts to determine whether a proposed plan is a safe plan that can be used for correctly evaluating the query. Operators such as the project operator in the proposed plan are evaluated to determine if they are safe. If so, the proposed plan is safe and will produce correct answers in a result. Due to the data complexity of some queries, a safe plan may not exist for a query. For these queries, either a “least unsafe plan,” or a Monte-Carlo simulation algorithm can be employed to produce a result with answers that have an acceptable error.

    摘要翻译: 支持任意复杂的SQL查询与“不确定”谓词的系统。 查询语义基于概率模型,结果被排序,就像在信息检索中一样,基于它们的概率。 采用可以有效地计算大多数查询的优化算法。 该算法尝试确定提出的计划是否是可用于正确评估查询的安全计划。 对拟议计划中的项目经营者等运营商进行评估,以确定其是否安全。 如果是这样,提出的计划是安全的,并将产生正确的答案。 由于某些查询的数据复杂性,查询可能不存在安全计划。 对于这些查询,可以采用“最不安全的计划”或蒙特卡罗模拟算法来产生具有可接受错误的答案的结果。

    Method and system for form-filling crawl and associating rich keywords
    10.
    发明授权
    Method and system for form-filling crawl and associating rich keywords 有权
    表单填充方法和系统抓取和关联丰富的关键字

    公开(公告)号:US08793239B2

    公开(公告)日:2014-07-29

    申请号:US12576011

    申请日:2009-10-08

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F17/30864

    摘要: Techniques are provided for the efficient location, processing, and retrieval of local product information derived from web pages generally locatable through form queries submitted to web pages often referred to as the “deep” or “hidden” web. In an embodiment, information such as product information and dealer-location information is located on a web page form such as a dealer-locator form. After location of a suitable web page form, editorial wrapping is performed to create an automated information extraction process. Using the automated information extractor, deep-web crawling is performed. A grid-based extraction of individual business records is performed, and matching and ingestion are performed in conjunction with a business listing database. Finally, metadata tags are added to entries in the business listing database. Metadata tags also may be added to entries in other databases.

    摘要翻译: 提供了技术,用于有效地定位,处理和检索从通常可通过提交到通常被称为“深”或“隐藏”网络的网页的表单查询的网页获得的本地产品信息。 在一个实施例中,诸如产品信息和经销商位置信息的信息位于诸如经销商定位器形式的网页形式上。 在找到合适的网页表单之后,执行编辑包装以创建自动化信息提取过程。 使用自动信息提取器,执行深度网页抓取。 执行单个业务记录的基于网格的提取,并且与业务列表数据库一起执行匹配和摄取。 最后,元数据标签被添加到业务列表数据库中的条目。 元数据标签也可以添加到其他数据库中的条目。