Abstract:
Disclosed herein are an apparatus and method for managing a data stream distributed parallel processing service. The apparatus includes a service management unit, a Quality of Service (QoS) monitoring unit, and a scheduling unit. The service management unit registers a plurality of tasks constituting the data stream distributed parallel processing service. The QoS monitoring unit gathers information about the load of the plurality of tasks and information about the load of a plurality of nodes constituting a cluster which provides the data stream distributed parallel processing service. The scheduling unit arranges the plurality of tasks by distributing the plurality of tasks among the plurality of nodes based on the information about the load of the plurality of tasks and the information about the load of the plurality of nodes.
Abstract:
Provided are a system and method for processing continuous integrated queries on both data stream and stored data using user-defined shared trigger. The system includes a data stream manager for managing data stream inputted from outside; a continuous integrated queries manager for managing the continuous integrated queries inputted from an external application; a trigger manager for managing the user-defined shared trigger inputted from the external application and registering the user-defined shared trigger in an external database; a trigger result manager for forming and managing a trigger result set from a performance result of the user-defined shared trigger registered in the cooperation database; and a continuous integrated queries performer for processing the continuous integrated queries referring to the transmitted data stream and trigger result set.
Abstract:
Disclosed herein is a system for processing large-capacity data in a distributed parallel processing manner based on MapReduce using a plurality of computing nodes. The distributed parallel processing system is configured to provide an incremental MapReduce-based distributed parallel processing function for large-capacity stream data which is being continuously collected even during the performance of the distributed parallel processing, as well as for large-capacity stored data which has been previously collected.