Community mining based on core objects and affiliated objects
    4.
    发明授权
    Community mining based on core objects and affiliated objects 失效
    基于核心对象和附属对象的社区挖掘

    公开(公告)号:US07885960B2

    公开(公告)日:2011-02-08

    申请号:US10624759

    申请日:2003-07-22

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F17/30873 G06F17/30864

    摘要: In community mining based on core objects and affiliated objects, a set of core objects for a community of objects are identified from a plurality of objects. The community is expanded, based on the set of core objects, to include a set of affiliated objects. According to one aspect, a model of a community of objects is obtained by grouping a first collection of a plurality of objects into a center portion, and grouping a second collection of the plurality of objects into one or more concentric portions around the center portion. The groupings of the first and second collections of the objects are identified as the community of objects.

    摘要翻译: 在基于核心对象和附属对象的社区挖掘中,从多个对象中识别出用于对象社区的一组核心对象。 基于一组核心对象扩展社区,包括一组附属对象。 根据一个方面,通过将多个对象的第一集合分组成中心部分并将多个对象的第二集合分组成围绕中心部分的一个或多个同心部分来获得对象社区的模型。 对象的第一和第二集合的分组被标识为对象的社区。

    Community mining based on core objects and affiliated objects
    5.
    发明申请
    Community mining based on core objects and affiliated objects 失效
    基于核心对象和附属对象的社区挖掘

    公开(公告)号:US20050021531A1

    公开(公告)日:2005-01-27

    申请号:US10624759

    申请日:2003-07-22

    IPC分类号: G06F17/30 G06F7/00

    CPC分类号: G06F17/30873 G06F17/30864

    摘要: In community mining based on core objects and affiliated objects, a set of core objects for a community of objects are identified from a plurality of objects. The community is expanded, based on the set of core objects, to include a set of affiliated objects. According to one aspect, a model of a community of objects is obtained by grouping a first collection of a plurality of objects into a center portion, and grouping a second collection of the plurality of objects into one or more concentric portions around the center portion. The groupings of the first and second collections of the objects are identified as the community of objects.

    摘要翻译: 在基于核心对象和附属对象的社区挖掘中,从多个对象中识别出用于对象社区的一组核心对象。 基于一组核心对象扩展社区,包括一组附属对象。 根据一个方面,通过将多个对象的第一集合分组成中心部分并将多个对象的第二集合分组成围绕中心部分的一个或多个同心部分来获得对象社区的模型。 对象的第一和第二集合的分组被标识为对象的社区。

    Webpage entity extraction through joint understanding of page structures and sentences
    6.
    发明授权
    Webpage entity extraction through joint understanding of page structures and sentences 有权
    网页实体提取通过联合理解页面结构和句子

    公开(公告)号:US09092424B2

    公开(公告)日:2015-07-28

    申请号:US12569912

    申请日:2009-09-30

    IPC分类号: G06F17/00 G06F17/27

    CPC分类号: G06F17/278

    摘要: Described is a technology for understanding entities of a webpage, e.g., to label the entities on the webpage. An iterative and bidirectional framework processes a webpage, including a text understanding component (e.g., extended Semi-CRF model) that provides text segmentation features to a structure understanding component (e.g., extended HCRF model). The structure understanding component uses the text segmentation features and visual layout features of the webpage to identify a structure (e.g., labeled block). The text understanding component in turn uses the labeled block to further understand the text. The process continues iteratively until a similarity criterion is met, at which time the entities may be labeled. Also described is the use of multiple mentions of a set of text in the webpage to help in labeling an entity.

    摘要翻译: 描述了一种用于理解网页的实体的技术,例如标记网页上的实体。 迭代和双向框架处理网页,包括向结构理解组件(例如,扩展HCRF模型)提供文本分段特征的文本理解组件(例如,扩展Semi-CRF模型)。 结构理解组件使用网页的文本分割特征和视觉布局特征来识别结构(例如,标记块)。 文本理解组件依次使用标记块来进一步理解文本。 该过程继续迭代直到满足相似性标准,此时实体可以被标记。 还描述了使用多个提及网页中的一组文本来帮助标注一个实体。

    Method and system for calculating importance of a block within a display page
    7.
    发明授权
    Method and system for calculating importance of a block within a display page 失效
    用于计算显示页面中块的重要性的方法和系统

    公开(公告)号:US08095478B2

    公开(公告)日:2012-01-10

    申请号:US12101109

    申请日:2008-04-10

    IPC分类号: G06F17/00 G06F17/20

    摘要: A method and system for identifying the importance of information areas of a display page. An importance system identifies information areas or blocks of a web page. A block of a web page represents an area of the web page that appears to relate to a similar topic. The importance system provides the characteristics or features of a block to an importance function that generates an indication of the importance of that block to its web page. The importance system “learns” the importance function by generating a model based on the features of blocks and the user-specified importance of those blocks. To learn the importance function, the importance system asks users to provide an indication of the importance of blocks of web pages in a collection of web pages.

    摘要翻译: 一种用于识别显示页面的信息区域的重要性的方法和系统。 重要性系统识别网页的信息区域或块。 网页的一个块表示网页的与类似主题相关的区域。 重要性系统将块的特征或特征提供给重要性功能,其产生该块对其网页的重要性的指示。 重要性系统通过基于块的特征和用户指定的这些块的重要性生成模型来“学习”重要性功能。 为了学习重要性功能,重要性系统要求用户提供网页集合中网页块重要性的指示。

    AUTOMATED SOCIAL NETWORKING GRAPH MINING AND VISUALIZATION
    8.
    发明申请
    AUTOMATED SOCIAL NETWORKING GRAPH MINING AND VISUALIZATION 有权
    自动化社会网络采矿与可视化

    公开(公告)号:US20110283205A1

    公开(公告)日:2011-11-17

    申请号:US12780522

    申请日:2010-05-14

    IPC分类号: G06F3/048 G06F17/30 G06F15/16

    CPC分类号: G06F17/30867

    摘要: The automated social networking graph mining and visualization technique described herein mines social connections and allows creation of a social networking graph from general (not necessarily social-application specific) Web pages. The technique uses the distances between a person's/entity's name and related people's/entities names on one or more Web pages to determine connections between people/entities and the strengths of the connections. In one embodiment, the technique lays out these connections, and then clusters them, in a 2-D layout of a social networking graph that represents the Web connection strengths among the related people's or entities' names, by using a force-directed model.

    摘要翻译: 本文描述的自动化社交网络图挖掘和可视化技术挖掘社会关系,并允许从通用(不一定是社交应用专用)网页创建社交网络图。 该技术使用个人/实体的名称与一个或多个网页上的相关人员/实体名称之间的距离来确定人员/实体之间的连接以及连接的优势。 在一个实施例中,该技术设置了这些连接,然后通过使用力导向模型将它们聚类在代表相关人或实体名称中的Web连接强度的社交网络图的二维布局中。

    Data-Centric Search Engine Architecture
    9.
    发明申请
    Data-Centric Search Engine Architecture 审中-公开
    以数据为中心的搜索引擎架构

    公开(公告)号:US20110137886A1

    公开(公告)日:2011-06-09

    申请号:US12632821

    申请日:2009-12-08

    IPC分类号: G06F17/30

    CPC分类号: G06F16/951

    摘要: Described is a data-centric web search engine technology/architecture, in which document metadata, including offline-extracted metadata, is used as part of a search indexing and ranking pipeline. A web data management component receives crawled documents and extracts document metadata from the documents. An indexing component uses the document metadata to build an index for the documents. A serving component uses the index and the document metadata to serve content, e.g., search results. Also described is the use of query metadata extracted from queries of a query log for use in the pipeline.

    摘要翻译: 描述了以数据为中心的网络搜索引擎技术/架构,其中包括离线提取的元数据的文档元数据被用作搜索索引和排序流水线的一部分。 Web数据管理组件接收爬取的文档并从文档中提取文档元数据。 索引组件使用文档元数据构建文档的索引。 服务组件使用索引和文档元数据来提供内容,例如搜索结果。 还描述了使用从查询日志的查询中提取的查询元数据用于流水线。

    SCORING RELEVANCE OF A DOCUMENT BASED ON IMAGE TEXT
    10.
    发明申请
    SCORING RELEVANCE OF A DOCUMENT BASED ON IMAGE TEXT 有权
    根据图像文本对文档的相关性进行分类

    公开(公告)号:US20110087660A1

    公开(公告)日:2011-04-14

    申请号:US12972259

    申请日:2010-12-17

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30864 G06F17/30265

    摘要: A method and system for determining relevance of a document having text and images to a text string is provided. A scoring system identifies image text associated with an image of the document. The scoring system calculates an image score indicating relevance of the image text to the text string. The image score may be used in many applications, such as searching, summary generation, and document classification, image search, and image classification.

    摘要翻译: 提供了一种用于确定具有文本和图像的文档与文本串的相关性的方法和系统。 评分系统识别与文档的图像相关联的图像文本。 评分系统计算指示图像文本与文本字符串的相关性的图像分数。 图像分数可以用于许多应用中,例如搜索,汇总生成和文档分类,图像搜索和图像分类。