Abstract:
A corpus of information describing queries used to access a transactional data store may be used to identify analytical relationships that are not explicitly defined in a schema or supplied by a user. Join relationships may be identified based on field coincidence in elements of queries in the corpus. Join relationships may be indicative of dimensions and attributes of a dimension. Hierarchy levels for a dimension may be identified based on factors including data type, reference in an aggregating clause, and reference in a grouping clause.
Abstract:
An online analytical processing system may comprise an n-dimensional cube partitioned into slices, in which each slice may represent data points at the intersections of fixed and variable dimensions. Computation of data points within a slice may be deferred. A dependency graph may be initially constructed, in which the dependency graph is utilized in a subsequent computation. Calculation of data points may be prioritized based on information indicative of a chance that the data points will be accessed.
Abstract:
An online analytical processing system may comprise an n-dimensional cube structured using slice-based partitioning in which each slice comprises data points corresponding to a set of dimension values fixed across the slice and a set of dimension values allowed to vary. Slices may be partitioned and replicated across computing nodes. Views of the n-dimensional cube may be partially materialized by determining dependencies between slices. A central data dictionary may maintain information about slices and slice dependencies. Dimensions may be added by adding a new slice without requiring immediate recomputation of existing data points.
Abstract:
An analytics module may be embedded into an application developed, published, or used by an entity in addition to the owner of the data under analysis. An access token may be submitted by the analytics module to a provider of hosted services. The access token may correspond to an n-dimensional cube containing data at a level of granularity permitted to the application. The access token may incorporate additional policies controlling access to the corresponding n-dimensional cube.
Abstract:
A platform for data analytics may be provided in a hosted environment on a multi-tenant system. The platform provider may also provide transactional processing services. Data obtained from processing the transactional services may be stored in an n-dimensional cube with which analytics may be performed. A dimension and hierarchy model may be identified based on correlations between hierarchy dimensions and levels in a dataset, or in schema and queries related to the dataset. Correlations may be further based on data received from a data stream. Priority for calculating a hierarchy may be based on data received from a data stream.
Abstract:
A hosted analytics system may be integrated with transactional data systems and additional data sources such real-time systems and log files. A data processing pipeline may transform data on arrival for incorporation into an n-dimensional cube. Correlation between patterns of events in transactional data may be identified. Upon arrival, new data may be transformed and incorporated into the n-dimensional cube. Similarity between the new data and a previously identified correlation may be determined and flagged.
Abstract:
A data analysis system determines a set of characteristics of a data set that is provided by a user. In various embodiments, individual characteristics may be statistical measures, analytical insights, data trends, or relationships with other data sets. The data analysis system selects a subset of the characteristics to be presented to the user. In an embodiment, the data analysis system determines a level of importance for each characteristic based at least in part on metadata associated with the data set, and in some embodiments, user preferences provided by the user. In an embodiment, the metadata includes descriptive names, data types, and data characteristics of the data set and of data elements within the data set.
Abstract:
A probabilistic counting structure such as a hyperloglog may be formed during a table scan for each of a selected set of columns. The columns may be selected based on an initial estimate of relatedness, which may be based on data types of the respective columns. An estimated cardinality of an intersection or union of columns may be formed based on an intersection of the probabilistic data structures. A join path may be determined based on the estimated cardinality of an intersection or union of the columns.
Abstract:
An analytics module may be embedded into an application developed, published, or used by an entity in addition to the owner of the data under analysis. An access token may be submitted by the analytics module to a provider of hosted services. The access token may correspond to an n-dimensional cube containing data at a level of granularity permitted to the application. The access token may incorporate additional policies controlling access to the corresponding n-dimensional cube.
Abstract:
An online analytical processing system may comprise an n-dimensional cube structured using slice-based partitioning in which each slice comprises one or more hierarchies of data points. A region of a hierarchy may be classified according to computational demands associated with the region. A scaling or replication mechanism may be applied to the region based on the computational demands associated with that region.