摘要:
An extensibility framework that allows a user to write user-defined modules that include user-defined operators (UDO) and user-defined aggregators (UDA) in a non-temporal fashion without the need to worry about temporal attributes of events (or event types). The temporal domain is managed on behalf of the user, and allows the user to write operators and aggregates in the temporal data streaming domain as well as to port existing libraries of non-temporal UDOs/UDAs to the temporal data streaming domain. Temporal attributes and event types are managed for non-temporal UDOs/UDAs by the extensibility framework on behalf of the UDO/UDA writer. Windows can be employed to bridge the gap between the non-temporal domains and temporal domains. Support for complex event processing (CEP) is provided in UDOs/UDAs for base classes related to a CEP operator, CEP aggregate, CEP time sensitive operator, and CEP time sensitive aggregate.
摘要:
A streaming operator assignment system and method for determining a streaming operator assignment that minimizes overload in a data processing system. Embodiments of the streaming operator assignment system include an optimization goals definition module, which defines optimization goals in terms of fundamental quantities that system administrators and application writers want to control, such as minimizing the worst case latency over all periods of time, or minimizing how much the system is backlogged with work. Embodiments of the streaming operator assignment system also include an optimization goals solution module that optimizes and solves a selected optimization goal. A specialized optimization technique is used to find the best operator (or load) assignment using the optimization goals to measure of the value of the assignment. This technique minimizes an optimization goal by iterating over all possible operators assignments over all possible nodes to find the operator assignment that minimizes the desired optimization goal.
摘要:
An extensibility framework that allows a user to write user-defined modules that include user-defined operators (UDO) and user-defined aggregators (UDA) in a non-temporal fashion without the need to worry about temporal attributes of events (or event types). The temporal domain is managed on behalf of the user, and allows the user to write operators and aggregates in the temporal data streaming domain as well as to port existing libraries of non-temporal UDOs/UDAs to the temporal data streaming domain. Temporal attributes and event types are managed for non-temporal UDOs/UDAs by the extensibility framework on behalf of the UDO/UDA writer. Windows can be employed to bridge the gap between the non-temporal domains and temporal domains. Support for complex event processing (CEP) is provided in UDOs/UDAs for base classes related to a CEP operator, CEP aggregate, CEP time sensitive operator, and CEP time sensitive aggregate.
摘要:
Prior to searching a multidimensional feature space populated with data objects, each dimension in the feature space is divided into a number of intervals. When a query is received, a single interval that is overlapped by the query is selected from each dimension. A reduced set of data objects is then selected that includes only those data objects that overlap the selected intervals. This reduced set of data objects, rather than the entire set of data objects in the feature space, is then used to determine matches for the query.
摘要:
A logical merge module is described herein for producing an output stream which is logically compatible with two or more physically divergent input streams. Representative applications of the logical merge module are also set forth herein.
摘要:
The described implementations relate to recursive streaming queries. One technique processes a recursive streaming query through a query graph. The technique also detects when output produced by executing the query graph advances to a specific point.
摘要:
A checkpoint marker can be received at a first operator. The first operator can process the checkpoint marker by sending the checkpoint marker to a second operator and sending state checkpoint information representing a state of the first operator to a checkpoint writer. The checkpoint information can be used to rehydrate the state of one or more operators. For example, after a system failure, system shutdown, etc., checkpoint information can be received from a reader unit at a checkpoint information input queue of the first operator. A state of the first operator can be rehydrated using the checkpoint information. Processing of information in a data input queue of the first operator can be suspended while the checkpoint information is used to rehydrate the state of the first operator. Other operators in a system with the first operator (e.g., the second operator) may be checkpointed and rehydrated in the same manner as the first operator.
摘要:
The described implementations relate to recursive streaming queries. One technique processes a recursive streaming query through a query graph. The technique also detects when output produced by executing the query graph advances to a specific point.
摘要:
Parameterized queries are optimized by a transformational optimizer. The optimizer produces a dynamic plan that embeds multiple plan options that may be selected to execute a particular query. Parameter distribution improves query execution efficiency and performance by exploring a sample parameter space representative of the parameter values actually used. The dynamic plans can be simplified while maintaining an acceptable level of optimality by reducing the number of plan options. The reduction is achieved by eliminating switch unions to alternatives that are close in cost. Both approaches of parameter space exploration and dynamic plan generation are deeply integrated into the query optimizer.
摘要:
A new approach for handling stream imperfections based on speculative execution involves the retraction of incorrect events facilitated using operators to remove speculatively produced incorrect output. Additionally, parameters are disclosed that define a spectrum of consistency levels. A first parameter, maximum blocking time, exposes a tradeoff between a degree of speculation and latency. A second parameter, the maximum time data is remembered before being purged from the system, exposes a tradeoff between state size and correctness. Varying these two parameters produces a spectrum of consistency levels (e.g., strong, middle, weak) which address the specific tradeoffs built into other systems. Retraction is accomplished using operators that include Select, AlterLifetime, Join, Sum, Align, and Finalize.