摘要:
A “Real-Time-Ready Analyzer” combines a data stream management system (DSMS) with a map-reduce (M-R) framework to construct a streaming map-reduce framework that is suitable for real-time Behavioral Targeting (BT) (or other temporal queries). The Real-Time-Ready Analyzer allows users to write “dual-intent” temporal analysis queries for BT. These queries are succinct and easy to express, scale well on large-scale offline data, and can also work over real-time data. Further, the Real-Time-Ready Analyzer uses the aforementioned streaming map-reduce framework to provide dual-intent algorithms for end-to-end BT phases. Experiments using real data from an advertisement system show that the Real-Time-Ready Analyzer is very efficient and incurs orders-of-magnitude lower development effort than conventional systems.
摘要:
A “Real-Time-Ready Analyzer” combines a data stream management system (DSMS) with a map-reduce (M-R) framework to construct a streaming map-reduce framework that is suitable for real-time Behavioral Targeting (BT) (or other temporal queries). The Real-Time-Ready Analyzer allows users to write “dual-intent” temporal analysis queries for BT. These queries are succinct and easy to express, scale well on large-scale offline data, and can also work over real-time data. Further, the Real-Time-Ready Analyzer uses the aforementioned streaming map-reduce framework to provide dual-intent algorithms for end-to-end BT phases. Experiments using real data from an advertisement system show that the Real-Time-Ready Analyzer is very efficient and incurs orders-of-magnitude lower development effort than conventional systems.
摘要:
A “Query Optimizer” provides a cost estimation metric referred to as “Maximum Accumulated Overload” (MAO). MAO is approximately equivalent to maximum system latency in a data stream management system (DSMS). Consequently, MAO is directly relevant for use in optimizing latencies in real-time streaming applications running multiple continuous queries (CQs) over high data-rate event sources. In various embodiments, the Query Optimizer computes MAO given knowledge of original operator statistics, including “operator selectivity” and “cycles/event” in combination with an expected event arrival workload. Beyond use in query optimization to minimize worst-case latency, MAO is useful for addressing problems including admission control, system provisioning, user latency reporting, operator placements (in a multi-node environment), etc. In addition, MAO, as a surrogate for worst-case latency, is generally applicable beyond streaming systems, to any queue-based workflow system with control over the scheduling strategy.
摘要:
Architecture introduces a new pattern operator referred to as called an augmented transition network (ATN), which is a streaming adaptation of non-reentrant, fixed-state ATNs for dynamic patterns. Additional user-defined information is associated with automaton states and is accessible to transitions during execution. ATNs are created that directly model complex pattern continuous queries with arbitrary cycles in a transition graph. The architecture can express the desire to ignore some events during pattern detection, and can also detect the absence of data as part of a pattern. The architecture facilitates efficient support for negation, ignorable events, and state cleanup based on predicate punctuations.
摘要:
Described herein are technologies pertaining to migrating state information of operators in a first continuous query plan to a second continuous query plan in an asynchronous manner, such that the first continuous query plan need not cease executing during the migrating of the state information. State information pertaining to stateful operators, such as join operators, is migrated from the first continuous query plan to the second continuous query plan by way of a transformation plan. State matching is utilized to generate the transformation plan.
摘要:
A logical merge module is described herein for producing an output stream which is logically compatible with two or more physically divergent input streams. Representative applications of the logical merge module are also set forth herein.
摘要:
A checkpoint marker can be received at a first operator. The first operator can process the checkpoint marker by sending the checkpoint marker to a second operator and sending state checkpoint information representing a state of the first operator to a checkpoint writer. The checkpoint information can be used to rehydrate the state of one or more operators. For example, after a system failure, system shutdown, etc., checkpoint information can be received from a reader unit at a checkpoint information input queue of the first operator. A state of the first operator can be rehydrated using the checkpoint information. Processing of information in a data input queue of the first operator can be suspended while the checkpoint information is used to rehydrate the state of the first operator. Other operators in a system with the first operator (e.g., the second operator) may be checkpointed and rehydrated in the same manner as the first operator.
摘要:
The description relates to cloud-edge topologies. Some aspects relate to cloud-edge applications and resource usage in various cloud-edge topologies. Another aspect of the present cloud-edge topologies can relate to the specification of cloud-edge applications using a temporal language. A further aspect can involve an architecture that runs data stream management systems (DSMSs) engines on the cloud and cloud-edge computers to run query parts.
摘要:
A logical merge module is described herein for producing an output stream which is logically compatible with two or more physically divergent input streams. Representative applications of the logical merge module are also set forth herein.
摘要:
A checkpoint marker can be received at a first operator. The first operator can process the checkpoint marker by sending the checkpoint marker to a second operator and sending state checkpoint information representing a state of the first operator to a checkpoint writer. The checkpoint information can be used to rehydrate the state of one or more operators. For example, after a system failure, system shutdown, etc., checkpoint information can be received from a reader unit at a checkpoint information input queue of the first operator. A state of the first operator can be rehydrated using the checkpoint information. Processing of information in a data input queue of the first operator can be suspended while the checkpoint information is used to rehydrate the state of the first operator. Other operators in a system with the first operator (e.g., the second operator) may be checkpointed and rehydrated in the same manner as the first operator.