USER DEFINED DATA PARTITIONING (UDP) - GROUPING OF DATA BASED ON COMPUTATION MODEL
    41.
    发明申请
    USER DEFINED DATA PARTITIONING (UDP) - GROUPING OF DATA BASED ON COMPUTATION MODEL 有权
    用户定义的数据分区(UDP) - 基于计算模型的数据分组

    公开(公告)号:US20100192148A1

    公开(公告)日:2010-07-29

    申请号:US12358995

    申请日:2009-01-23

    摘要: Methods, systems, and computer program products are provided for generating application-aware data partitioning to support parallel computing. A label for a user defined data partitioning (UDP) key is generated by a labeling process to configure data partitions of original data. The UDP is labeled by the labeling process to include at least one key property excluded from the original data. The data partitions are evenly distributed to co-locate and balance the data partitions and corresponding computations performed by computational servers. A data record of the data partitions is retrieved by performing an all-node parallel search of the computational servers using the UDP key.

    摘要翻译: 提供了方法,系统和计算机程序产品,用于生成应用感知数据分区以支持并行计算。 用户定义的数据分区(UDP)密钥的标签由标记过程生成,以配置原始数据的数据分区。 UDP由标记过程标记,以包含从原始数据中排除的至少一个密钥属性。 数据分区均匀分布,以共同定位和平衡数据分区和计算服务器执行的相应计算。 通过使用UDP密钥执行计算服务器的全部节点并行搜索来检索数据分区的数据记录。

    Method for visualizing large volumes of multiple-attribute data without aggregation using a pixel bar chart
    42.
    发明授权
    Method for visualizing large volumes of multiple-attribute data without aggregation using a pixel bar chart 有权
    使用像素条形图可视化大量多个属性数据而不进行聚合的方法

    公开(公告)号:US07221474B2

    公开(公告)日:2007-05-22

    申请号:US09917393

    申请日:2001-07-27

    IPC分类号: G06K15/00

    CPC分类号: G06T11/206

    摘要: A method for graphically presenting large volumes of data without aggregation using a pixel bar chart. Records having multiple attributes are sorted for constructing a graphically displayable array, wherein the graphically displayable array comprises a plurality of pixels. Each pixel represents one record. The non-aggregation data visualization technique of the present invention provides solutions to meet the need of automatic data preparation for the visual data mining of massive data volumes. The present invention effectively uses screen space to represent each record without cluttering the display, allowing a user to easily discover patterns and correlations. The present invention provides a visual impression by representing the value of a record by a color and representing the number of records by the area of a group. With “drill down” capability, a user can navigate through each record to find detail information. Each record is represented by one pixel, allowing millions of records to be displayed at the same time. Each individual record can be accessed interactively, by allowing direct access to the detail data by picking at single pixels.

    摘要翻译: 使用像素条形图图形化地呈现大量数据而不进行聚合的方法。 对具有多个属性的记录进行排序以构建图形可显示阵列,其中图形可显示阵列包括多个像素。 每个像素代表一个记录。 本发明的非聚合数据可视化技术提供了满足大量数据量的视觉数据挖掘的自动数据准备的需要的解决方案。 本发明有效地使用屏幕空间来表示每个记录而不会使显示器混乱,从而允许用户容易地发现图案和相关性。 本发明通过用颜色表示记录的值并且通过组的区域表示记录的数量来提供视觉印象。 通过“下钻”功能,用户可以浏览每个记录以查找详细信息。 每个记录由一个像素表示,允许同时显示数百万条记录。 可以通过交互方式来访问每个单独的记录,方法是通过单个像素选择直接访问细节数据。

    Document clustering method and system
    43.
    发明授权
    Document clustering method and system 有权
    文档聚类方法和系统

    公开(公告)号:US07181678B2

    公开(公告)日:2007-02-20

    申请号:US10767151

    申请日:2004-01-29

    IPC分类号: G06F17/00

    摘要: Document clustering method and system utilizing both the log-based clustering method and the content-based clustering method are disclosed. The method includes the steps of generating log-based document clusters and combining vectors from the log-based document clusters with individual document clusters for content-based clustering analysis. The log-based document clusters are generated by accessing the retrieval session log, clustering the retrieval sessions, and combining the documents opened during each of the sessions of session clusters.

    摘要翻译: 公开了利用基于日志的聚类方法和基于内容的聚类方法的文档聚类方法和系统。 该方法包括以下步骤:生成基于日志的文档集群,并将基于日志的文档集群的向量与用于基于内容的聚类分析的单个文档集合相结合。 基于日志的文档集群是通过访问检索会话日志,聚类检索会话,以及组合在会话集群的每个会话期间打开的文档来生成的。

    Document clustering method and system
    44.
    发明授权
    Document clustering method and system 有权
    文档聚类方法和系统

    公开(公告)号:US06728932B1

    公开(公告)日:2004-04-27

    申请号:US09532539

    申请日:2000-03-22

    IPC分类号: G06F1700

    摘要: Document clustering method and system utilizing both the log-based clustering method and the content-based clustering method are disclosed. The method includes the steps of generating log-based document clusters and combining vectors from the log-based document clusters with individual document clusters for content-based clustering analysis. The log-based document clusters are generated by accessing the retrieval session log, clustering the retrieval sessions, and combining the documents opened during each of the sessions of session clusters.

    摘要翻译: 公开了利用基于日志的聚类方法和基于内容的聚类方法的文档聚类方法和系统。 该方法包括以下步骤:生成基于日志的文档集群,并将基于日志的文档集群的向量与用于基于内容的聚类分析的单个文档集合相结合。 基于日志的文档集群是通过访问检索会话日志,聚类检索会话,以及组合在会话集群的每个会话期间打开的文档来生成的。

    Apparatus and method for discovering context groups and document categories by mining usage logs
    45.
    发明授权
    Apparatus and method for discovering context groups and document categories by mining usage logs 有权
    通过挖掘使用日志发现上下文组和文档类别的装置和方法

    公开(公告)号:US06502091B1

    公开(公告)日:2002-12-31

    申请号:US09511195

    申请日:2000-02-23

    IPC分类号: G06F1730

    摘要: An apparatus is provided for relating user queries and documents. The apparatus includes a client, a server, and a database being mutually coupled to a communications pathway. The client is configured to enable a user to submit user queries to locate documents. The server has a data mining mechanism configured to receive the user queries and generate information retrieval sessions. The database stores data in the form of usage logs generated from the information retrieval sessions. The data mining mechanism includes a clustering algorithm operative to identify context groups and usage categories. The data mining mechanism is operative to identify query contexts associated with individual queries from the usage logs, partition the queries into context groups having similar contexts, and compute multiple context groups associated with specific query keywords from the usage logs. A method is provided for associating user queries and documents in accordance with the apparatus.

    摘要翻译: 提供了一种用于关联用户查询和文档的装置。 该装置包括客户机,服务器和与通信路径相互耦合的数据库。 客户端配置为使用户能够提交用户查询来定位文档。 服务器具有配置为接收用户查询并生成信息检索会话的数据挖掘机制。 数据库以从信息检索会话生成的使用日志的形式存储数据。 数据挖掘机制包括可用于识别上下文组和使用类别的聚类算法。 数据挖掘机制用于识别来自使用日志的与各个查询相关联的查询上下文,将查询划分为具有相似上下文的上下文组,以及从使用日志计算与特定查询关键字相关联的多个上下文组。 提供了一种用于根据该装置关联用户查询和文档的方法。

    Work flow management system and method
    46.
    发明授权
    Work flow management system and method 失效
    工作流程管理系统和方法

    公开(公告)号:US5581691A

    公开(公告)日:1996-12-03

    申请号:US516729

    申请日:1995-08-18

    CPC分类号: G06F9/4436 G06F9/466

    摘要: A work flow description database represents long running work flows as a set of work units, called steps, with information flows therebetween. The description database defines each step's input and output signals, input condition criteria for creating an instance of the step, an application program associated with the step, and criteria for selecting a resource to execute the step. A work flow controller controls the process of executing instances of each defined type of work flow. Execution of a long running work flow begins when a corresponding set of externally generated input event signals are received by the work flow controller. During execution of a work flow, each step of the work flow is instantiated only when a sufficient set of input signals is received to execute that step. At that point an instance of the required type of step is created and then executed by a selected resource. After termination of a step, output signals from the step are converted into input event signals for other steps in the work flow in accordance with data stored in the work flow description database. Each step executes an application program and is treated as an individual transaction insofar as durable storage of its results. Log records are durably stored upon instantiation, execution and termination of each step of a work flow, and output event signals are also logged, thereby durably storing sufficient data to recover a work flow with virtually no loss of the work that was accomplished prior to a system failure.

    摘要翻译: 工作流程描述数据库表示作为一组工作单元(称为步骤)的长时间运行的工作流程,其间具有信息流。 描述数据库定义每个步骤的输入和输出信号,用于创建步骤的实例的输入条件标准,与步骤相关联的应用程序以及用于选择执行步骤的资源的标准。 工作流控制器控制执行每个定义类型的工作流的实例的过程。 当工作流控制器接收到相应的一组外部生成的输入事件信号时,开始长时间运行的工作流程的执行。 在执行工作流程期间,仅当接收到足够的输入信号集才能执行该步骤时,工作流程的每个步骤被​​实例化。 此时,创建所需类型步骤的实例,然后由所选资源执行。 在步骤结束之后,根据存储在工作流程描述数据库中的数据,将来自步骤的输出信号转换为工作流程中的其他步骤的输入事件信号。 每个步骤执行一个应用程序,并将其视为一个单独的事务,只要其结果的持久存储即可。 日志记录在工作流程的每个步骤的实例化,执行和终止时持久地存储,并且还记录输出事件信号,从而持久地存储足够的数据以恢复工作流程,几乎不损失在 系统错误。

    Processing a data stream
    47.
    发明授权
    Processing a data stream 有权
    处理数据流

    公开(公告)号:US09405801B2

    公开(公告)日:2016-08-02

    申请号:US12703574

    申请日:2010-02-10

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30516

    摘要: Methods, database management systems (“DBMS”) and computer-readable media are provided for processing unbounded stream data using a traditional DBMS. Execution of a query that includes a data stream as a data source may be initiated. Tuples may be processed in accordance with the query as the tuples are received through the data stream until an indication is received that execution of the query should cease.

    摘要翻译: 提供方法,数据库管理系统(“DBMS”)和计算机可读介质用于使用传统的DBMS来处理无界流数据。 可以启动包括数据流作为数据源的查询的执行。 可以根据查询来处理元组,因为通过数据流接收元组,直到接收到指示才能停止查询的执行。

    Continuous querying of a data stream
    48.
    发明授权
    Continuous querying of a data stream 有权
    连续查询数据流

    公开(公告)号:US09195708B2

    公开(公告)日:2015-11-24

    申请号:US13878473

    申请日:2010-10-14

    IPC分类号: G06F17/30

    CPC分类号: G06F17/30424 G06F17/30516

    摘要: In continuous querying of a data stream, a query including query cycles can be initialized (310) on a query engine to analyze the data stream for desired information. The data stream can be processed (320) as segments, where a size of the segments is based on a user-defined parameter. The query cycles can be synchronized (330) with the segments of the data stream. A first segment can be analyzed (340) by performing the query on the first segment to obtain a first result. A query state of the query can be persisted (350) and the query operation can be rewound to begin a new query cycle. A second segment can be analyzed (360) in the new query cycle by performing the query on the second segment based on the first result.

    摘要翻译: 在数据流的连续查询中,可以在查询引擎上初始化包括查询循环的查询(310),以分析所需信息的数据流。 可以将数据流处理(320)作为段,其中段的大小基于用户定义的参数。 查询周期可以与数据流的段同步(330)。 可以通过对第一段执行查询来获得第一结果来分析第一段(340)。 查询的查询状态可以被持久化(350),并且查询操作可以被倒退以开始一个新的查询周期。 可以在新的查询周期中通过基于第一个结果对第二个分段执行查询来分析(360)第二分段。