Abstract:
Techniques for obtaining information about data entity instances managed by a data processing system using at least one data store. The techniques include obtaining a query comprising a first portion comprising information for identifying instances of a first data entity stored in at least one data store; and a second portion indicating at least one attribute of the first data entity; generating, from the query, a plurality of executable queries including a first set of one or more executable queries and a second set of one or more executable queries, the generating comprising: generating, using the first portion, the first set of executable queries for identifying instances of the first data entity, and generating, using the second portion, the second set of executable queries for obtaining attribute values for instances of the first data entity; and executing the plurality of executable queries to obtain results for the query.
Abstract:
Techniques for storing data entities by a data processing system are described herein. The data processing system may store a plurality of data entity instances generated using a plurality of data entities. The plurality of data entity instances may include a first data entity instance generated using a first data entity and a second data entity instance generated using a second data entity. The first data entity instance may include a first attribute that is configured to inherit its value from a second attribute of the second data entity instance. The data processing system may provide the inherited value of the second attribute of the second data entity instance as the value of the first attribute of the first data entity instance.
Abstract:
Techniques for generating a dataflow graph include generating a first dataflow graph with a plurality of first nodes representing first computer operations in processing data, with at least one of the first computer operations being a declarative operation that specifies one or more characteristics of one or more results of processing of data, and transforming the first dataflow graph into a second dataflow graph for processing data in accordance with the first computer operations, the second dataflow graph including a plurality of second nodes representing second computer operations, with at least one of the second nodes representing one or more imperative operations that implement the logic specified by the declarative operation, where the one or more imperative operations are unrepresented by the first nodes in the first dataflow graph.
Abstract:
A method implemented by a data processing system including: executing a dataflow graph that includes the plurality of components and the links, with a given component of the plurality including an input port, an audit port and an output port; processing, by the dataflow graph with the components and the links, the one or more data records representing the transaction, wherein the at least one of the components saves a state specifying one or more input records that are processed by the at least one of the components; when an error occurs during processing of one or more input records by the given component, restoring a state of the at least one of the components to the saved state; and based on the restored state, recovering at least some of the audit data for the given component of the dataflow graph.
Abstract:
At least one non-transitory computer-readable storage medium storing processor- executable instructions that, when executed by at least one computer hardware processor, cause the at least one computer hardware processor to perform: obtaining an automatically generated initial dataflow graph, the initial dataflow graph comprising a first plurality of nodes representing a first plurality of data processing operations and a first plurality of links representing flows of data among nodes in the first plurality of nodes; and generating an updated dataflow graph by iteratively applying dataflow graph optimization rules to update the initial dataflow graph, the updated dataflow graph comprising a second plurality of nodes representing a second plurality of data processing operations and a second plurality of links representing flows of data among nodes in the second plurality of nodes.
Abstract:
A method for processing state update requests in a distributed data processing system includes processing a set of state update requests associated with a first time interval including maintaining a count of issued state update requests for the set of state update requests, maintaining a count of state updates performed for the first set of state update requests, and updating a state consistency indicator to indicate that state updates associated with all state update requests of the first set of state update requests have been performed in response to determining that the count of state updates performed for the first set of state update requests equals the count of issued state update requests for the first set of state update requests.
Abstract:
A method for processing state update requests in a distributed data processing system with a number of processing nodes includes maintaining a number of counters including a working counter indicating a current time interval, a replication counter indicating a time interval for which all requests associated with that time interval are replicated at multiple processing nodes of the number of processing nodes, and a persistence counter indicating a time interval of the number of time intervals for which all requests associated with that time interval are stored in persistent storage. The counters are used to manage processing of the state update requests.
Abstract:
A method is described for processing keyed data items that are each associated with a value of a key, the keyed data items being from a plurality of distinct data streams, the processing including collecting the keyed data items, determining, based on contents of at least one of the keyed data items, satisfaction of one or more specified conditions for execution of one or more actions and causing execution of at least one of the one or more actions responsive to the determining.
Abstract:
A method performed by a data processing system for processing data, the method including: intermittently receiving data from one or more data streams, the received data including data records; detecting two or more particular data records in the received data records, where the detected two or more particular data records each include a particular identifier; for that particular identifier, creating a collection of data records; for at least one particular data record included in the collection of data records, searching data records for a historical aggregation of data; and computing combined data; modifying a data record by inserting the combined data into a field of the data record and by inserting data from at least one of the data records in the collection into another field of the data record; based on applying the rules, writing to memory one or more instructions for initiation of one or more actions.
Abstract:
A data processing system configured to store a plurality of data entities in volatile memories of multiple different computing devices. The data processing system comprises a first computing device having a first volatile memory configured to store a first data entity; and a second computing device having a second volatile memory configured to store a copy of the first data entity. The first computing device is configured to perform: receiving an indication to update the first data entity; after receiving the indication, updating the first data entity in the first volatile memory, and providing to the second computing device an indication to update the copy of the first data entity; and providing an indication that the first data entity has been updated, after receiving information from the second computing device indicating that the copy of the first data entity has been updated in the second volatile memory.