摘要:
A computer implemented method, apparatus, and computer usable program code for creating normalized data from markup language data. User defined parameters are received for retrieving event data, wherein the parameters define a type of event and a subset of attributes for the type of event. In response to receiving the parameters, a process is configured using the type of event and the subset of attributes for the type of event to form a configured process. A set of records is processed using the configured process, wherein the configured process places data corresponding to each attribute in the subset of attributes for the type of event from the set of records into a table to form the normalized data.
摘要:
A computer implemented method, apparatus, and computer usable program code for processing event data. In response to receiving an event, a size of the event data for the event is compared to a threshold size to form a comparison. The information about an event and event data is stored in a first entry in a main table in a database if the comparison indicates that the size of the event data is one that can be stored in the main table. The information about the event is placed in the first entry in the main table if the size is greater than the threshold size. The event data is stored in a second entry in an overflow table if the size is greater than the threshold size, wherein the entry includes a pointer to the first entry. The main table and overflow table form a live set and hold the current live data.
摘要:
A computer implemented method, apparatus, and computer usable program code for creating normalized data from markup language data. User defined parameters are received for retrieving event data, wherein the parameters define a type of event and a subset of attributes for the type of event. In response to receiving the parameters, a process is configured using the type of event and the subset of attributes for the type of event to form a configured process. A set of records is processed using the configured process, wherein the configured process places data corresponding to each attribute in the subset of attributes for the type of event from the set of records into a table to form the normalized data.
摘要:
A computer implemented method, apparatus, and computer usable program code for processing event data. In response to receiving an event, a size of the event data for the event is compared to a threshold size to form a comparison. The information about an event and event data is stored in a first entry in a main table in a database if the comparison indicates that the size of the event data is one that can be stored in the main table. The information about the event is placed in the first entry in the main table if the size is greater than the threshold size. The event data is stored in a second entry in an overflow table if the size is greater than the threshold size, wherein the entry includes a pointer to the first entry. The main table and overflow table form a live set and hold the current live data.
摘要:
A method for shard assignment in a large-scale data processing job is provided. Datasets are divided into a plurality of shards and the shards are indexed and aggregated into one or more groups. A worker process is initially assigned an indexed shard from a group. The initial assignment can assigned based on a simple algorithm. The worker's subsequent shard assignment is based on the index of the initially assigned shard.