Abstract:
Methods, systems, and products characterize consistency of data in a stream warehouse. A warehouse table is derived from a continuously received a stream of data. The warehouse table is stored in memory as a plurality of temporal partitions, with each temporal partition storing data within a contiguous range of time. A level of consistency is assigned to each temporal partition in the warehouse table.
Abstract:
A system may include a processor, a user input, and memory comprising a graph and executable instructions. The executable instructions may cause the processor to effectuate operations. The operations include receiving, via the user input, a query comprising a class generalization and pathway variables. The operations include identifying a query class based on at least the class generalization and determining an anchor set based on at least one of the pathway variables. The operations also include translating the pathway variables into a pathway algebraic expression based on the anchor set and the query class and executing the pathway algebraic expression on the graph to return a pathway set.
Abstract:
A device, method and computer-readable medium for generating unique identification for records in a data streaming processing system are disclosed. A method may collect an identification of a source of a data record, a timestamp of the data record and a count of a number of records a data source has added to the stream with that timestamp, generate a unique identification with the identification of the source of the data record, the timestamp of the data record and the count of the number of records the data source has added to the stream with that timestamp, apply the unique identification to the data record and transmit the data record with the unique identification to a downstream operator within a data stream processing system.
Abstract:
A device, method and computer-readable medium for recovering a replica in an operator in a data streaming processing system are disclosed. A method may obtain a checkpoint in an input data stream, determine a maximum-timestamp at the checkpoint in the input data stream, calculate a completeness point that is greater than the maximum-timestamp for an output data stream and process data records from the checkpoint onwards that have a respective timestamp that is greater than or equal to the completeness point that was calculated to generate a new replica to replace a failed replica.
Abstract:
Concepts and technologies are disclosed herein for managing a distributed database. A data management application can obtain a query. The data management application can analyze the query to determine a number of data structures relevant to the query. The data management application also can analyze data stores storing the data structures and move or assign data structures to other data stores within a distributed database. The movement of the data structures within the distributed database can be based upon greedy algorithms for moving data and/or executing queries.
Abstract:
Concepts and technologies are disclosed herein for generating and using temporal metadata partitions. Metadata can be stored in temporal metadata partitions based upon a time range included in the metadata. Furthermore, metadata can be stored in multiple temporal metadata partitions to which the metadata is relevant. As such, metadata can be stored in manner that allows event data to be understood in the context of temporally accurate and/or relevant metadata. Functionality for executing queries of event data and providing results in view of metadata, as well as the merging of multiple temporal metadata partitions also are disclosed.
Abstract:
A method and system for providing query aware partitioning are disclosed. For example, the method receives a query plan comprising a plurality of queries, and classifies each one of the plurality of queries. The method computes an optimal partition set for each one of the plurality of queries, and reconciles the optimal partition set of each one of the plurality of queries with at least one subset of queries of the plurality of queries. The method selects at least one reconciled optimal partition set to be used by each query of the plurality of queries, and stores the selected at least one reconciled optimal partition set in a computer readable medium.
Abstract:
Concepts and technologies are disclosed herein for generating and using temporal metadata partitions. Metadata can be stored in temporal metadata partitions based upon a time range included in the metadata. Furthermore, metadata can be stored in multiple temporal metadata partitions to which the metadata is relevant. As such, metadata can be stored in manner that allows event data to be understood in the context of temporally accurate and/or relevant metadata. Functionality for executing queries of event data and providing results in view of metadata, as well as the merging of multiple temporal metadata partitions also are disclosed.
Abstract:
Concepts and technologies are disclosed herein for managing a distributed database. A data management application can obtain a query. The data management application can analyze the query to determine a number of data structures relevant to the query. The data management application also can analyze data stores storing the data structures and move or assign data structures to other data stores within a distributed database. The movement of the data structures within the distributed database can be based upon greedy algorithms for moving data and/or executing queries.
Abstract:
A method includes parsing a regular pathway expression into fragments including an anchored fragment and at least one other fragment. A number of the fragments is based on at least a length limitation of the regular pathway expression. The method includes generating an operator directed acyclic graph (DAG) including non-operator nodes, operator nodes, and a root based on at least the anchored fragment. The method includes removing, from the operator DAG, at least one of the non-operator nodes and connecting a first operator node to a second operator node of the operator nodes. The first operator node includes an edge into the at least one removed non-operator node, and the second operator node includes an edge from the at least one removed node. The method includes executing the operator DAG on a graph database to return a pathway set comprising at least one pathway that satisfies the regular pathway expression.