摘要:
Methods, systems, and computer program products are provided for generating application-aware data partitioning to support parallel computing. A label for a user defined data partitioning (UDP) key is generated by a labeling process to configure data partitions of original data. The UDP is labeled by the labeling process to include at least one key property excluded from the original data. The data partitions are evenly distributed to co-locate and balance the data partitions and corresponding computations performed by computational servers. A data record of the data partitions is retrieved by performing an all-node parallel search of the computational servers using the UDP key.
摘要:
A method for graphically presenting large volumes of data without aggregation using a pixel bar chart. Records having multiple attributes are sorted for constructing a graphically displayable array, wherein the graphically displayable array comprises a plurality of pixels. Each pixel represents one record. The non-aggregation data visualization technique of the present invention provides solutions to meet the need of automatic data preparation for the visual data mining of massive data volumes. The present invention effectively uses screen space to represent each record without cluttering the display, allowing a user to easily discover patterns and correlations. The present invention provides a visual impression by representing the value of a record by a color and representing the number of records by the area of a group. With “drill down” capability, a user can navigate through each record to find detail information. Each record is represented by one pixel, allowing millions of records to be displayed at the same time. Each individual record can be accessed interactively, by allowing direct access to the detail data by picking at single pixels.
摘要:
Document clustering method and system utilizing both the log-based clustering method and the content-based clustering method are disclosed. The method includes the steps of generating log-based document clusters and combining vectors from the log-based document clusters with individual document clusters for content-based clustering analysis. The log-based document clusters are generated by accessing the retrieval session log, clustering the retrieval sessions, and combining the documents opened during each of the sessions of session clusters.
摘要:
Document clustering method and system utilizing both the log-based clustering method and the content-based clustering method are disclosed. The method includes the steps of generating log-based document clusters and combining vectors from the log-based document clusters with individual document clusters for content-based clustering analysis. The log-based document clusters are generated by accessing the retrieval session log, clustering the retrieval sessions, and combining the documents opened during each of the sessions of session clusters.
摘要:
An apparatus is provided for relating user queries and documents. The apparatus includes a client, a server, and a database being mutually coupled to a communications pathway. The client is configured to enable a user to submit user queries to locate documents. The server has a data mining mechanism configured to receive the user queries and generate information retrieval sessions. The database stores data in the form of usage logs generated from the information retrieval sessions. The data mining mechanism includes a clustering algorithm operative to identify context groups and usage categories. The data mining mechanism is operative to identify query contexts associated with individual queries from the usage logs, partition the queries into context groups having similar contexts, and compute multiple context groups associated with specific query keywords from the usage logs. A method is provided for associating user queries and documents in accordance with the apparatus.
摘要:
A work flow description database represents long running work flows as a set of work units, called steps, with information flows therebetween. The description database defines each step's input and output signals, input condition criteria for creating an instance of the step, an application program associated with the step, and criteria for selecting a resource to execute the step. A work flow controller controls the process of executing instances of each defined type of work flow. Execution of a long running work flow begins when a corresponding set of externally generated input event signals are received by the work flow controller. During execution of a work flow, each step of the work flow is instantiated only when a sufficient set of input signals is received to execute that step. At that point an instance of the required type of step is created and then executed by a selected resource. After termination of a step, output signals from the step are converted into input event signals for other steps in the work flow in accordance with data stored in the work flow description database. Each step executes an application program and is treated as an individual transaction insofar as durable storage of its results. Log records are durably stored upon instantiation, execution and termination of each step of a work flow, and output event signals are also logged, thereby durably storing sufficient data to recover a work flow with virtually no loss of the work that was accomplished prior to a system failure.
摘要:
Methods, database management systems (“DBMS”) and computer-readable media are provided for processing unbounded stream data using a traditional DBMS. Execution of a query that includes a data stream as a data source may be initiated. Tuples may be processed in accordance with the query as the tuples are received through the data stream until an indication is received that execution of the query should cease.
摘要:
In continuous querying of a data stream, a query including query cycles can be initialized (310) on a query engine to analyze the data stream for desired information. The data stream can be processed (320) as segments, where a size of the segments is based on a user-defined parameter. The query cycles can be synchronized (330) with the segments of the data stream. A first segment can be analyzed (340) by performing the query on the first segment to obtain a first result. A query state of the query can be persisted (350) and the query operation can be rewound to begin a new query cycle. A second segment can be analyzed (360) in the new query cycle by performing the query on the second segment based on the first result.
摘要:
A system may include an extraction engine to extract candidate phrases from a content stream, and an analysis engine to assign the candidate phrases visual cues and display the visual cues to an operator.
摘要:
An event occurring in a particular geographic region is identified based on disseminated information containing public commentary in the particular geographic region. Attributes that are related to the event are identified, and sentiment words relating to the identified event are extracted from the disseminated information, where the extracted sentiment words are in a local language of the particular geographic region. A sentiment trend visualization is generated that depicts a trend of sentiments of at least a particular one of the identified attributes, wherein the sentiments are based on the sentiment words for at least the particular attribute.