摘要:
A computer-implemented method, which comprises the following: receiving a graphical selection of a subset of data points from a set of data points, each data point representing at least one record of a dimensionally-modeled fact collection; and exporting information associated with the selected subset of data points.
摘要:
A technique of operating a user interface that enables the user to graphically manipulate records of a dimensionally-modeled fact collection, which comprises the following: receiving a graphical selection of a subset from a set of data points, each data point representing at least one record of the dimensionally-modeled fact collection; receiving a graphical manipulation of the selected subset of data points; defining at least one data group using the selected subset of data points and based on the graphical manipulation, wherein each data group comprises between 0 to n records represented by the selected subset of data points, wherein n is the total number of data points in the set of data points; and graphically representing the at least one data group. Alternatively, the technique comprises the following: performing an operation on at least one data group as described above; and graphically representing a result of the operation.
摘要:
A computer-implemented method, which comprises the following: receiving a graphical selection of a subset of data points from a set of data points, each data point representing at least one record of a dimensionally-modeled fact collection; and exporting information associated with the selected subset of data points.
摘要:
A method for visually representing a plurality of clusters, the plurality of clusters comprising a plurality of entities with respect to a plurality of entity attributes is provided. The plurality of entities is segmented into the plurality of clusters, such that each individual entity belongs to at least one cluster. A plurality of entity data regarding the plurality of entities is processed to obtain a plurality of characteristics of each cluster of the plurality of clusters with respect to each entity attribute of the plurality of entity attributes. A visual display of the plurality of clusters with respect to the plurality of entity attributes is generated, such that for each cluster of the plurality of clusters and for each entity attribute of the plurality of entity attributes, a portion of the display represents at least two characteristics of the plurality of characteristics simultaneously for that cluster with respect to that entity attribute.
摘要:
An unsupervised classification approach is improved by imposing some order into the treatment of the records and their attributes, which otherwise would be treated as random variables. A method is provided to identify particular attributes that are most associated with the “good” records within each of the plurality of groups of records within a data set. Based on a supervised scoring method, the records of the data set are processed to indicate their measure of “goodness”. There are various ways by which the records can be processed to indicate a bias during unsupervised clustering processing.
摘要:
A portion of data records of a full input data set are imported into memory of a computer system for processing by an executing application. The full input data set includes data records of a dimensionally-modeled fact collection. An amount of the data of the full input set to import is determined based on an amount of available memory of the computer system. The sampling characteristics for sampling the full input data set are determined based on the amount of the data that can be imported and on characteristics of the full input data set and application involved. The full input data set is then sampled and a portion of the records are imported into the memory of the computer system for processing. The sampling characteristics are determined such that analysis as a result of processing by the executing application of the sampled portion of the records imported is representative of the analysis that could otherwise be carried out on the full input data set, with a calculable statistical relevance.
摘要:
A computer-implemented method, which comprises the following: aggregating a plurality of records in accordance with an aggregation specification, wherein the records are part of a dimensionally-modeled fact collection; graphically representing the records in un-aggregated form; graphically representing the records in aggregated form; and causing the graphical representation of the records to be switched between aggregated form and un-aggregated form based on a user indication.
摘要:
A computer-implemented method, which comprises the following: aggregating a plurality of records in accordance with an aggregation specification, wherein the records are part of a dimensionally-modeled fact collection; graphically representing the records in un-aggregated form; graphically representing the records in aggregated form; and causing the graphical representation of the records to be switched between aggregated form and un-aggregated form based on a user indication.
摘要:
Records representing items in a dimensionally-modeled fact collection are assigned to bins. A count-based portion of a user interface receives user bin assignment specification of the records based on user-specified counts of records. Actual counts for bin assignment are determined by constraining records having a same data value at a specified particular dimension to be within the same bin. A user-observable indication of the determined actual counts is provided. The user interface may include a value-based portion. The value-based portion of the user interface may be operated to receive user indication of bin assignment specification of records based on user-specified at least one value at the particular dimension. Determining actual counts includes reconciling the user indication of bin assignment specification in the count-based portion with the user indication of bin assignment specification in the value-based portion.
摘要:
A portion of data records of a full input data set are imported into memory of a computer system for processing by an executing application. The full input data set includes data records of a dimensionally-modeled fact collection. An amount of the data of the full input set to import is determined based on an amount of available memory of the computer system. The sampling characteristics for sampling the full input data set are determined based on the amount of the data that can be imported and on characteristics of the full input data set and application involved. The full input data set is then sampled and a portion of the records are imported into the memory of the computer system for processing. The sampling characteristics are determined such that analysis as a result of processing by the executing application of the sampled portion of the records imported is representative of the analysis that could otherwise be carried out on the full input data set, with a calculable statistical relevance.