Locating and sampling of data in parallel processing systems
    11.
    发明授权
    Locating and sampling of data in parallel processing systems 失效
    并行处理系统中数据的定位和采样

    公开(公告)号:US6049861A

    公开(公告)日:2000-04-11

    申请号:US892402

    申请日:1997-07-15

    IPC分类号: G06F17/30 G06F12/00

    CPC分类号: G06F17/30539

    摘要: A method is disclosed for reproducible sampling of data items of a dataset which is shared across a plurality of nodes of a parallel data processing system.In data mining of large databases, segmentation of the database is often necessary either to obtain a summary of the database or prior to an operation such as link analysis. A sample of data records are taken to create an initial segmentation model. The records of this sample and the initial model created from them can be critical to the results of the data mining process, and the initial model may not be reproducible unless the same sampling of data records is repeatable. Reproducible sampling is enabled without polling of all nodes to locate particular records. Parametric control information with a small number of control parameters is generated which describes the particular partitioning of the dataset. The parametric control information enables computing of the location of a data record. The parametric control information may be distributed to each node and enable computing of the location of data records by each node. The invention is applicable to other sampling methods.

    摘要翻译: 公开了一种用于对并行数据处理系统的多个节点共享的数据集的数据项的可重复采样的方法。 在大型数据库的数据挖掘中,数据库的分割往往需要获得数据库的摘要或在诸如链接分析之类的操作之前。 采取数据记录的样本来创建初始分割模型。 该样本的记录和从其创建的初始模型对于数据挖掘过程的结果可能是至关重要的,除非相同的数据记录采样是可重复的,否则初始模型可能不可重现。 启用可重现的采样,而不会轮询所有节点来定位特定记录。 生成具有少量控制参数的参数控制信息,其描述数据集的特定划分。 参数控制信息可以计算数据记录的位置。 可以将参数控制信息分配给每个节点,并且能够计算每个节点的数据记录的位置。 本发明适用于其他抽样方法。

    Parallel data processing system and method of controlling such a system
    12.
    发明授权
    Parallel data processing system and method of controlling such a system 失效
    并行数据处理系统及其控制方法

    公开(公告)号:US5905904A

    公开(公告)日:1999-05-18

    申请号:US906685

    申请日:1997-08-05

    CPC分类号: G06F9/546

    摘要: A parallel processing system having a number of processing nodes (S1 . . . Sn) each of which is provided with a message handling kernel (13 . . . n3) and an associated procedure register (14 . . . n4), the procedure registers being loadable by loading means (4) under the control of an application (3) and the nodes being able to exchange messages over a message interface (2) and to process messages to determine the message handling procedures to be invoked by the associated node in accordance with the contents of the associated procedure register.

    摘要翻译: 一种具有多个处理节点(S1 ... Sn)的并行处理系统,每个处理节点设置有消息处理核(13 ... n3)和相关联的过程寄存器(14 ... n4),该过程寄存器 通过在应用程序(3)的控制下的加载装置(4)可加载,并且节点能够通过消息接口(2)交换消息,并且处理消息以确定由相关节点调用的消息处理过程 根据相关程序注册表的内容。

    Generating a taxonomy for documents from tag data
    13.
    发明授权
    Generating a taxonomy for documents from tag data 有权
    从标签数据生成文档的分类法

    公开(公告)号:US08346776B2

    公开(公告)日:2013-01-01

    申请号:US12781755

    申请日:2010-05-17

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30737

    摘要: A method and system for generating a taxonomy for documents from tag data are provided. The method includes obtaining tag data in the form of tags on documents with tag weightings for a document and clustering the tags using the tag weightings for documents, wherein each cluster is an identified subject. The documents are associated with each identified subject and the subjects are compared to identifying relationships between subjects to build a taxonomy of subjects. A tag weighting for a document is the number of times the tag is applied to the document with a user rating of the relevance of the tag to the document. The steps are carried out automatically without user intervention.

    摘要翻译: 提供了一种用于从标签数据生成文档分类法的方法和系统。 该方法包括以文档的标签形式获得标签形式的标签数据,并使用文档的标签权重对标签进行聚类,其中每个群集是识别的对象。 这些文件与每个确定的主题相关联,并且将主题与识别受试者之间的关系进行比较以构建受试者的分类学。 文档的标签加权是标签应用于文档的次数,用户评级为标签与文档的相关性。 自动执行步骤,无需用户干预。

    GENERATING A TAXONOMY FOR DOCUMENTS FROM TAG DATA
    14.
    发明申请
    GENERATING A TAXONOMY FOR DOCUMENTS FROM TAG DATA 有权
    从标签数据生成文档的税收

    公开(公告)号:US20120191718A1

    公开(公告)日:2012-07-26

    申请号:US13436374

    申请日:2012-03-30

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30737

    摘要: A method and system for generating a taxonomy for documents from tag data are provided. The method includes obtaining tag data in the form of tags on documents with tag weightings for a document and clustering the tags using the tag weightings for documents, wherein each cluster is an identified subject. The documents are associated with each identified subject and the subjects are compared to identifying relationships between subjects to build a taxonomy of subjects. A tag weighting for a document is the number of times the tag is applied to the document with a user rating of the relevance of the tag to the document. The steps are carried out automatically without user intervention.

    摘要翻译: 提供了一种用于从标签数据生成文档分类法的方法和系统。 该方法包括以文档的标签形式获得标签形式的标签数据,并使用文档的标签权重对标签进行聚类,其中每个群集是识别的对象。 这些文件与每个确定的主题相关联,并且将主题与识别受试者之间的关系进行比较以构建受试者的分类学。 文档的标签加权是标签应用于文档的次数,用户评级为标签与文档的相关性。 自动执行步骤,无需用户干预。