Abstract:
An improved system and method for removing a storage server in a distributed column chunk data store is provided. A distributed column chunk data store may be provided by multiple storage servers operably coupled to a network. A storage server provided may include a database engine for partitioning a data table into the column chunks for distributing across multiple storage servers, a storage shared memory for storing the column chunks during processing of semantic operations performed on the column chunks, and a storage services manager for striping column chunks of a partitioned data table across multiple storage servers. Any data table may be flexibly partitioned into column chunks using one or more columns with various partitioning methods. Storage servers may then be removed and column chunks may be redistributed among the remaining storage servers in the column chunk data store.
Abstract:
An improved system and method for removing a storage server in a distributed column chunk data store is provided. A distributed column chunk data store may be provided by multiple storage servers operably coupled to a network. A storage server provided may include a database engine for partitioning a data table into the column chunks for distributing across multiple storage servers, a storage shared memory for storing the column chunks during processing of semantic operations performed on the column chunks, and a storage services manager for striping column chunks of a partitioned data table across multiple storage servers. Any data table may be flexibly partitioned into column chunks using one or more columns with various partitioning methods. Storage servers may then be removed and column chunks may be redistributed among the remaining storage servers in the column chunk data store.
Abstract:
An improved system and method for importing update data in a distributed column chunk data store is provided. A distributed column chunk data store may be provided by multiple storage servers operably coupled to a network. A storage server provided may include a database engine for partitioning a data table into the column chunks for distributing across multiple storage servers, a storage shared memory for storing the column chunks during processing of semantic operations performed on the column chunks, and a storage services manager for striping column chunks of a partitioned data table across multiple storage servers. Any data table may be flexibly partitioned into column chunks using one or more columns with various partitioning methods. Update data may then be incrementally imported as separate column chunks that may later be merged with the column chunks of the partitioned data table.
Abstract:
A facility for defining a distinguished segment of individuals within a population of individuals is described. The facility displays a prompt for user input specifying a natural-language characterization of a segment membership criterion for identifying individuals who are members of the distinguished segment. The facility then receives, in response to the displayed prompt, user input specifying a natural-language characterization of a segment membership criterion for identifying individuals who are members of the distinguished segment.
Abstract:
An improved system and method for query processing in a distributed column chunk data store is provided. A distributed column chunk data store may be provided by multiple storage servers operably coupled to a network. A storage server provided may include a database engine for partitioning a data table into the column chunks for distributing across multiple storage servers, a storage shared memory for storing the column chunks during processing of semantic operations performed on the column chunks, and a storage services manager for striping column chunks of a partitioned data table across multiple storage servers. Query processing may be performed by storage servers or query processing servers operably coupled by a network to storage servers in the column chunk data store. To do so, a hierarchy of servers may be dynamically determined to process execution steps of a query transformed for distributed processing.
Abstract:
An improved system and method for query processing in a distributed column chunk data store is provided. A distributed column chunk data store may be provided by multiple storage servers operably coupled to a network. A storage server provided may include a database engine for partitioning a data table into the column chunks for distributing across multiple storage servers, a storage shared memory for storing the column chunks during processing of semantic operations performed on the column chunks, and a storage services manager for striping column chunks of a partitioned data table across multiple storage servers. Query processing may be performed by storage servers or query processing servers operably coupled by a network to storage servers in the column chunk data store. To do so, a hierarchy of servers may be dynamically determined to process execution steps of a query transformed for distributed processing.
Abstract:
A facility for identifying groups of items that co-occur in more than a threshold number of instances is described. Each such group of items has a size reflecting the number of items in the group. The facility uses a data structure comprising, for each of a plurality of group sizes, a single map identifying groups of that group size that co-occur in more than a threshold number of instances.
Abstract:
A method, system and computer-readable medium for analyzing interaction or usage data, such as for customers, is described. The interaction or usage data may be stored in log files and supplemented with data from other sources. Various data parsing information may be defined and used as part of the analysis, such as by using customer-specific information to identify various occurrences of interest. For example, when analyzing a customer's web site interaction data, the parser component can use data defining customer-specific types of web site events of interest. Such high-level types of occurrences can be specified in a variety of ways, such as by using a combination of a logical web site, one or more URIs corresponding to web pages, and/or one or more query strings. The data parsing information may also specify a mapping of actual web sites to one or more logical sites.
Abstract:
A method and system for importing data into a data store in accordance with metadata. The import system provides metadata that specifies how the import data for various types of import sources is to be imported into the data store. The import sources may be categorized according to the type of data provided by the import sources. When the import system receives the import data from the import source, it identifies the type of import source and retrieves the metadata defined for that type of import source. The import system then imports the received import data into the data store in accordance with the retrieved metadata.
Abstract:
A method, system and computer-readable medium for analyzing interaction or usage data, such as for customers, is described. The interaction or usage data may be stored in log files and supplemented with data from other sources. Various data parsing information may be defined and used as part of the analysis, such as by using customer-specific information to identify various occurrences of interest. For example, when analyzing a customer's web site interaction data, the parser component can use data defining customer-specific categories of web pages. Such high-level types of occurrences can be specified in a variety of ways, such as by using a combination of a logical web site, one or more URIs corresponding to web pages, and/or one or more query strings. The data parsing information may also specify a mapping of actual web sites to one or more logical sites.