Method and system for ranking objects of different object types

    公开(公告)号:US20060235810A1

    公开(公告)日:2006-10-19

    申请号:US11106017

    申请日:2005-04-13

    IPC分类号: G06F15/18

    摘要: A method and system for ranking objects of different object types based on their popularity is provided. A ranking system calculates the popularity of objects based on relationships between the objects. A relationship indicates how one object is related to another object. Thus, objects of one object type may have one or more relationships with objects of another object type. One goal of the ranking system is to rank the objects of the different object types based on their popularity. The objects and their relationships can be represented using a graph with nodes representing objects and links representing relationships between objects. The ranking system assigns a popularity propagation factor to each relationship to represent its contribution to the popularity of objects of that type.

    RETRIEVAL OF STRUCTURED DOCUMENTS
    72.
    发明申请

    公开(公告)号:US20060161532A1

    公开(公告)日:2006-07-20

    申请号:US11277345

    申请日:2006-03-23

    申请人: Ji-Rong Wen Hang Cui

    发明人: Ji-Rong Wen Hang Cui

    IPC分类号: G06F17/30

    摘要: This disclosure relates to performing a query for a search term of a database containing a plurality of structured documents. Those structured documents that do not include the search term are ferreted or filtered out during an initial search. Matched structured documents which are those structured documents that do contain the search term are evaluated by ranking the individual elements based on how well each individual element matches the search term, and indicating to the user the ranking of the individual elements wherein the individual elements can be accessed by the user.

    Vision-Based Document Segmentation
    73.
    发明申请
    Vision-Based Document Segmentation 失效
    基于视觉的文档分割

    公开(公告)号:US20060106798A1

    公开(公告)日:2006-05-18

    申请号:US11275488

    申请日:2006-01-09

    IPC分类号: G06F17/30 G06F7/00

    摘要: Vision-based document segmentation identifies one or more portions of semantic content of a document. The one or more portions are identified by identifying a plurality of visual blocks in the document, and detecting one or more separators between the visual blocks of the plurality of visual blocks. A content structure for the document is constructed based at least in part on the plurality of visual blocks and the one or more separators, and the content structure identifies the one or more portions of semantic content of the document. The content structure obtained using the vision-based document segmentation can optionally be used during document retrieval.

    摘要翻译: 基于视觉的文档分割识别文档的语义内容的一个或多个部分。 通过识别文档中的多个可视块并且检测多个视觉块中的可视块之间的一个或多个分隔符来识别一个或多个部分。 至少部分地基于多个可视块和一个或多个分隔符来构建文档的内容结构,并且内容结构标识文档的语义内容的一个或多个部分。 使用基于视觉的文档分割获得的内容结构可以可选地在文档检索期间使用。

    Mining service requests for product support
    75.
    发明申请
    Mining service requests for product support 审中-公开
    采矿服务请求产品支持

    公开(公告)号:US20050234973A1

    公开(公告)日:2005-10-20

    申请号:US10826160

    申请日:2004-04-15

    CPC分类号: G06N5/00 G06N5/02

    摘要: Systems and methods for mining service requests for product support are described. In one aspect, unstructured service requests are converted to one or more structured answer objects. Each structured answer object includes hierarchically structured historic problem diagnosis data. In view of a product problem description, a set of the one or more structured answer data objects is identified. Each structured solution data object in the set includes term(s) and/or phrase(s) related to the product problem description. Historic and hierarchically structured problem diagnosis data from the set is provided to an end-user for product problem diagnosis.

    摘要翻译: 描述了产品支持挖掘服务请求的系统和方法。 在一个方面,非结构化服务请求被转换成一个或多个结构化答案对象。 每个结构化答案对象包括分层结构的历史问题诊断数据。 鉴于产品问题描述,识别一组一个或多个结构化答案数据对象。 该集合中的每个结构化解决方案数据对象包括与产品问题描述相关的术语和/或短语。 将集合中的历史和分层结构的问题诊断数据提供给最终用户进行产品问题诊断。

    Vision-based document segmentation
    76.
    发明申请
    Vision-based document segmentation 失效
    基于视觉的文档分割

    公开(公告)号:US20050028077A1

    公开(公告)日:2005-02-03

    申请号:US10628766

    申请日:2003-07-28

    摘要: Vision-based document segmentation identifies one or more portions of semantic content of a document. The one or more portions are identified by identifying a plurality of visual blocks in the document, and detecting one or more separators between the visual blocks of the plurality of visual blocks. A content structure for the document is constructed based at least in part on the plurality of visual blocks and the one or more separators, and the content structure identifies the one or more portions of semantic content of the document. The content structure obtained using the vision-based document segmentation can optionally be used during document retrieval.

    摘要翻译: 基于视觉的文档分割识别文档的语义内容的一个或多个部分。 通过识别文档中的多个可视块并且检测多个视觉块中的可视块之间的一个或多个分隔符来识别一个或多个部分。 至少部分地基于多个可视块和一个或多个分隔符来构建文档的内容结构,并且内容结构标识文档的语义内容的一个或多个部分。 使用基于视觉的文档分割获得的内容结构可以可选地在文档检索期间使用。

    Community mining based on core objects and affiliated objects
    77.
    发明申请
    Community mining based on core objects and affiliated objects 失效
    基于核心对象和附属对象的社区挖掘

    公开(公告)号:US20050021531A1

    公开(公告)日:2005-01-27

    申请号:US10624759

    申请日:2003-07-22

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F17/30873 G06F17/30864

    摘要: In community mining based on core objects and affiliated objects, a set of core objects for a community of objects are identified from a plurality of objects. The community is expanded, based on the set of core objects, to include a set of affiliated objects. According to one aspect, a model of a community of objects is obtained by grouping a first collection of a plurality of objects into a center portion, and grouping a second collection of the plurality of objects into one or more concentric portions around the center portion. The groupings of the first and second collections of the objects are identified as the community of objects.

    摘要翻译: 在基于核心对象和附属对象的社区挖掘中,从多个对象中识别出用于对象社区的一组核心对象。 基于一组核心对象扩展社区,包括一组附属对象。 根据一个方面,通过将多个对象的第一集合分组成中心部分并将多个对象的第二集合分组成围绕中心部分的一个或多个同心部分来获得对象社区的模型。 对象的第一和第二集合的分组被标识为对象的社区。

    INFORMATION SENSORS FOR SENSING WEB DYNAMICS
    79.
    发明申请
    INFORMATION SENSORS FOR SENSING WEB DYNAMICS 审中-公开
    感应网络动态信息传感器

    公开(公告)号:US20160125083A1

    公开(公告)日:2016-05-05

    申请号:US14896339

    申请日:2013-06-07

    IPC分类号: G06F17/30

    CPC分类号: G06F16/951

    摘要: Disclosed herein are techniques and systems for building “information sensors,” which are programmable “focused crawlers” that periodically discover, extract, analyze and aggregate structured information around a topic from the Web. A platform for building an information sensor allows a user to specify one or more data elements within a data source that the user desires to monitor, and an update frequency at which the data elements are to be extracted. Code may be generated based on the user specifications for creation and submission of the information sensor for storage in a database with metadata containing the code and update frequency. Once created, information sensors are scanned to check if running conditions are met, and if met, they may be executed by retrieving the metadata using a sensor identifier (ID). The code is executed to locate a data source, and periodically extract specified data elements therefrom to output structured time-series data.

    摘要翻译: 本文公开了用于构建“信息传感器”的技术和系统,其是可编程的“聚焦爬行器”,其周期性地从Web发现,提取,分析和聚合关于主题的结构化信息。 用于构建信息传感器的平台允许用户指定用户期望监视的数据源内的一个或多个数据元素以及要提取数据元素的更新频率。 代码可以基于用于创建和提交信息传感器的用户规范来生成,用于存储在具有包含代码和更新频率的元数据的数据库中。 一旦创建,扫描信息传感器以检查是否满足运行条件,如果满足,则可以使用传感器标识符(ID)检索元数据来执行它们。 执行代码以定位数据源,并且从中定期提取指定的数据元素以输出结构化的时间序列数据。

    Web-scale entity summarization
    80.
    发明授权
    Web-scale entity summarization 有权
    网络规模实体总结

    公开(公告)号:US08229960B2

    公开(公告)日:2012-07-24

    申请号:US12570023

    申请日:2009-09-30

    IPC分类号: G06F17/00

    CPC分类号: G06F17/30867

    摘要: Described is a summarizing a web entity (e.g., a person, place, product or so forth) based upon the entity's appearance in web documents (e.g., on the order of hundreds of millions or billions of webpages). Webpages are separated into blocks, which are then processed according to various features to filter the number of blocks to further process, and rank the most relevant blocks with respect to the entity that remain. A redundancy removal mechanism removes redundant blocks, leaving a set of remaining blocks that are used to provide a summary of information that is relevant to the entity.

    摘要翻译: 描述了基于实体在web文档中的出现(例如,数亿或数十亿个网页的数量级)来汇总web实体(例如,人,地点,产品等)。 网页被分成块,然后根据各种特征来处理块以过滤块的数量以进一步处理,并且相对于保留的实体排列最相关的块。 冗余删除机制去除冗余块,留下一组用于提供与该实体相关的信息摘要的剩余块。