Abstract:
Techniques are provided for dependency-aware transaction batching for data replication. A plurality of change records corresponding to a plurality of transactions is read. Inter-transaction dependency data is generated, the inter-transaction dependency data including at least one inter-transaction dependency relationship between a plurality of pending transactions. Each inter-transaction dependency relationship indicates that a first transaction is dependent on a second transaction. A batch transaction is generated based on the inter-transaction dependency data. The batch transaction includes at least one pending transaction of the plurality of pending transactions. The batch transaction is assigned to an apply process of a plurality of apply processes configured to apply batch transactions in parallel.
Abstract:
Techniques are described for generating period profiles. According to an embodiment, a set of time series data is received, where the set of time series data includes data spanning a plurality of time windows having a seasonal period. Based at least in part on the set of time-series data, a first set of sub-periods of the seasonal period is associated with a particular class of seasonal pattern. A profile for a seasonal period that identifies which sub-periods of the seasonal period are associated with the particular class of seasonal pattern is generated and stored, in volatile or non-volatile storage. Based on the profile, a visualization is generated for at least one sub-period of the first set of sub-periods of the seasonal period that indicates that the at least one sub-period is part of the particular class of seasonal pattern.
Abstract:
Techniques are described for performing cluster analysis on a set of data points using tri-point arbitration. In one embodiment, a first cluster that includes a set of data points is generated within volatile and/or non-volatile storage of a computing device. A set of tri-point arbitration similarity values are computed where each similarity value in the set of similarity values corresponds to a respective data point pair and is computed based, at least in part, on a distance between the respective data point pair and a set of one or more arbiter data points. The first cluster is partitioned within volatile and/or non-volatile storage of the computing device into a set of two or more clusters. A determination is made, based at least in part on the set of similarity values in the tri-arbitration similarity matrix, whether to continue partitioning the set of data points.
Abstract:
Techniques are described for generating seasonal forecasts. According to an embodiment, a set of time-series data is associated with one or more classes, which may include a first class that represent a dense pattern that repeats over multiple instances of a season in the set of time-series data and a second class that represent another pattern that repeats over multiple instances of the season in the set of time-series data. A particular class of data is associated with at least two sub-classes of data, where a first sub-class represents high data points from the first class, and a second sub-class represents another set of data points from the first class. A trend rate is determined for a particular sub-class. Based at least in part on the trend rate, a forecast is generated.
Abstract:
The disclosed embodiments provide a system that detects anomalous events. During operation, the system obtains machine-generated time-series performance data collected during execution of a software program in a computer system. Next, the system removes a subset of the machine-generated time-series performance data within an interval around one or more known anomalous events of the software program to generate filtered time-series performance data. The system uses the filtered time-series performance data to build a statistical model of normal behavior in the software program and obtains a number of unique patterns learned by the statistical model. When the number of unique patterns satisfies a complexity threshold, the system applies the statistical model to subsequent machine-generated time-series performance data from the software program to identify an anomaly in an activity of the software program and stores an indication of the anomaly for the software program upon identifying the anomaly.
Abstract:
Techniques are provided for dependency-aware transaction batching for data replication. A plurality of change records corresponding to a plurality of transactions is read. Inter-transaction dependency data is generated, the inter-transaction dependency data including at least one inter-transaction dependency relationship between a plurality of pending transactions. Each inter-transaction dependency relationship indicates that a first transaction is dependent on a second transaction. A batch transaction is generated based on the inter-transaction dependency data. The batch transaction includes at least one pending transaction of the plurality of pending transactions. The batch transaction is assigned to an apply process of a plurality of apply processes configured to apply batch transactions in parallel.
Abstract:
Techniques are provided for eager replication of uncommitted transactions. A first plurality of change records is received, corresponding to database changes applied to a source database in a first transaction. First transaction dependency data is computed based on the first transaction. At least a portion of the first plurality of change records is applied to the target database before processing a commit record indicating that has been committed on the source database. Target dependency data is updated after processing the first commit record to reflect completion of the first transaction, the target dependency data including dependency data for a plurality of transactions applied or scheduled to be applied on a target database.
Abstract:
Techniques are provided for automatic parallelism tuning. At least one batch of change records is assigned to one or more apply processes in a set of active apply processes. A first throughput value is periodically determined based on a number of processed change records in a first time interval. An increment adjustment is periodically performed, including adding an additional apply process, determining a second throughput value, and removing the additional apply process from the set of active apply processes if the second throughput value is not greater than a previous first throughput value by at least an increment threshold. A decrement adjustment is periodically performed, including removing an apply process, determining a third throughput value, and replacing the removed apply process in the set of active apply processes if the third throughput value is not greater than the previous first throughput value by at least a decrement threshold.
Abstract:
Techniques are provided for client and server integration for scalable replication. A replication client transmits change records to a database server over a stream. The database server determines at least one batch comprising change records for at least one transaction. The database server generates dependency data for at least one change record in a batch based on at least one constraint identifier for at least one column. The database server determines an ordered grouping of the change records based on an operation type of each change record and the dependency data of each change record, wherein change records sharing operation types are grouped together unless a division based on the dependency data is determined. The database server generates a reordered transaction comprising a plurality of reordered operations based on the ordered grouping of the change records of the particular batch.
Abstract:
Transient duplicate key violations of unique key constraints are handled during row updates. Row changes are buffered until a point is reached that those changes are safe to execute. Row changes are effectively reordered to avoid constraint violations during execution of updates. In response to receiving a constraint key violation from a server after an attempted update, a client locally stores a record containing information regarding the failed update. Later, in response to the lack of receipt of an error in response to another update to the same column of the same table, the client uses the information in this record to instruct the server to attempt to repeat a failed update that previously attempted to change the value of a row to a value that was present in a uniqueness-constrained column at the time of the failure, but is no longer present due to the successful update.