Method and apparatus for populating a predefined concept hierarchy or other hierarchical set of classified data items by minimizing system entrophy
    41.
    发明授权
    Method and apparatus for populating a predefined concept hierarchy or other hierarchical set of classified data items by minimizing system entrophy 失效
    用于通过最小化系统萎缩来填充预定义概念层级或其他分层数据集合的方法和装置

    公开(公告)号:US07320000B2

    公开(公告)日:2008-01-15

    申请号:US10309612

    申请日:2002-12-04

    IPC分类号: G06F7/10

    CPC分类号: G06F17/30 Y10S707/99937

    摘要: A system and method for automated populating of an existing concept hierarchy of items with new items, using entropy as a measure of the correctness of a potential classification. User-defined concept hierarchies include, for example, document hierarchies such as directories for the Internet, library catalogues, patent databases and journals, and product hierarchies. These concept hierarchies can be huge and are usually maintained manually. An internet directory may have, for example, millions of Web sites, thousands of editors and hundreds of thousands of different categories. The method for populating a concept hierarchy includes calculating conditional ‘entropy’ values representing the randomness of distribution of classification attributes for the hierarchical set of classes if a new item is added to specific classes of the hierarchy and then selecting whichever class has the minimum randomness of distribution when calculated as a condition of insertion of the new data item.

    摘要翻译: 一种使用熵作为潜在分类正确性的量度来自动填充具有新项目的项目的现有概念层次结构的系统和方法。 用户定义的概念层次结构包括例如文档层次结构,例如因特网的目录,图书馆目录,专利数据库和期刊以及产品层次结构。 这些概念层次结构可以是巨大的,通常是手动维护的。 互联网目录可能具有数百万个网站,数千个编辑者和数十万个不同类别。 用于填充概念层次的方法包括:如果将新项目添加到层级的特定类别,然后选择哪个类别具有最小随机性,则计算表示分级集合类的分类属性的分布随机性的条件“熵值” 当作为插入新数据项的条件计算时的分配。

    Method and system to bundle message over a network
    42.
    发明授权
    Method and system to bundle message over a network 有权
    通过网络捆绑消息的方法和系统

    公开(公告)号:US07299273B2

    公开(公告)日:2007-11-20

    申请号:US10319966

    申请日:2002-12-16

    IPC分类号: G06F15/177

    CPC分类号: G06F9/5061 G06F2209/505

    摘要: The invention describes a method and system to optimize network bandwidth and obtain greater efficiency in transmission of messages/data in, a client-server network. The invention proposes the use of clustering of client requests and the data items in such a manner so as to optimize the network transmission as well as reduce the cost of processing involved in sending and picking/pruning the data items at server and client end respectively.

    摘要翻译: 本发明描述了一种优化网络带宽并在客户端 - 服务器网络中消息/数据传输效率更高的方法和系统。 本发明提出了以这样的方式使用客户端请求和数据项的聚类,以便优化网络传输以及降低在服务器端和客户端端发送和拣选/修剪数据项所涉及的处理成本。

    Methods, apparatus and computer programs for evaluating and using a resilient data representation
    43.
    发明授权
    Methods, apparatus and computer programs for evaluating and using a resilient data representation 有权
    用于评估和使用弹性数据表示的方法,装置和计算机程序

    公开(公告)号:US07254577B2

    公开(公告)日:2007-08-07

    申请号:US10880141

    申请日:2004-06-29

    IPC分类号: G06F17/30

    摘要: Provided are methods, apparatus and computer programs for evaluating the resilience, to structural changes in a data source, of a representative label representing a data element within the data source. Also disclosed are applications using a resilient representative label. For example, a representative label may represent a particular data field or other data element within a semi-structured data source—such as within XML or HTML Web pages. An estimate of resilience to changes can be used to determine whether a candidate representative label satisfies a required degree of resilience, or to enable selection of a label with the highest resilience score among a set of representative labels. The validated or selected representative label may then be used for data extraction, remaining usable despite the possibility of future changes to the structure of a Web page, or for template clustering/classification.

    摘要翻译: 提供了用于评估表示数据源中的数据元素的代表性标签的弹性(数据源中的结构变化)的方法,装置和计算机程序。 还公开了使用弹性代表性标签的应用。 例如,代表性标签可以表示半结构化数据源中的特定数据字段或其他数据元素,例如在XML或HTML网页内。 可以使用对变化的弹性的估计来确定候选代表标签是否满足所需的弹性程度,或者使得能够在一组代表性标签中选择具有最高回弹分数的标签。 经验证或选择的代表性标签然后可用于数据提取,尽管可能将来会改变网页的结构,或用于模板聚类/分类,仍然可用。

    System and method for extraction of factoids from textual repositories
    44.
    发明申请
    System and method for extraction of factoids from textual repositories 失效
    从文本库中提取事实的系统和方法

    公开(公告)号:US20070162447A1

    公开(公告)日:2007-07-12

    申请号:US11321177

    申请日:2005-12-29

    IPC分类号: G06F7/00

    CPC分类号: G06F17/30864 G06F17/30705

    摘要: A method (400) is disclosed of extracting factoids from text repositories, with the factoids being associated with a given factoid category. The method (400) starts by training a classifier (230) to recognise factoids relevant to that given factoid category. Documents or document summaries relevant to the given factoid category is next collected (410) from the text repositories. Sentences having a predetermined association to the given factoid category is extracted (420) from the documents or said document summaries. Those sentences are classified (440), in a noisy environment, using the classifier (230) to extract snippets containing phrases relevant to the given factoid category. It is the extracted snippets that are the factoid associated with the given factoid category.

    摘要翻译: 公开了一种从文本存储库中提取事实框架的方法(400),其中事实框架与给定的类别类别相关联。 方法(400)通过训练分类器(230)开始,以识别与该给定的类别类别相关的因子。 接下来从文本存储库收集与文件类型相关的文档或文档摘要(410)。 具有与给定类别类别的预定关联的句子从文档或所述文档摘要中提取(420)。 这些句子在嘈杂的环境中被分类(440),使用分类器(230)提取包含与给定类别类别相关的短语的片段。 提取的片段是与给定类实体类别相关联的实例。

    Methods, apparatus and computer programs for evaluating and using a resilient data representation
    45.
    发明申请
    Methods, apparatus and computer programs for evaluating and using a resilient data representation 有权
    用于评估和使用弹性数据表示的方法,装置和计算机程序

    公开(公告)号:US20060026157A1

    公开(公告)日:2006-02-02

    申请号:US10880141

    申请日:2004-06-29

    IPC分类号: G06F17/30

    摘要: Provided are methods, apparatus and computer programs for evaluating the resilience, to structural changes in a data source, of a representative label representing a data element within the data source. Also disclosed are applications using a resilient representative label. For example, a representative label may represent a particular data field or other data element within a semi-structured data source - such as within XML or HTML Web pages. An estimate of resilience to changes can be used to determine whether a candidate representative label satisfies a required degree of resilience, or to enable selection of a label with the highest resilience score among a set of representative labels. The validated or selected representative label may then be used for data extraction, remaining usable despite the possibility of future changes to the structure of a Web page, or for template clustering/classification.

    摘要翻译: 提供了用于评估表示数据源中的数据元素的代表性标签的弹性(数据源中的结构变化)的方法,装置和计算机程序。 还公开了使用弹性代表性标签的应用。 例如,代表性标签可以表示半结构化数据源中的特定数据字段或其他数据元素,例如在XML或HTML网页内。 可以使用对变化的弹性的估计来确定候选代表标签是否满足所需的弹性程度,或者使得能够在一组代表性标签中选择具有最高回弹分数的标签。 经验证或选择的代表性标签然后可用于数据提取,尽管可能将来会改变网页的结构,或用于模板聚类/分类,仍然可用。