摘要:
A method for identifying matching transactions between two log files where each transaction includes one or more statements. Each log file record records the execution of a statement and includes a transaction identifier. Each record in turn in one log file is compared to an advancing window of records in the other log file. A first table contains associations of statements to transactions and transactions to statements for records in the window. If a match is found between a record in the one file and a record in the window, information associating partial transactions in the one file to potential transactions of the records in the window is added to a second table. If an end-of-transaction record is read from the one file, a best match is found between the ended transaction and the potential transactions based on information in the first and second tables.
摘要:
A method for identifying matching transactions between two log files where each transaction includes one or more statements. Each log file record records the execution of a statement and includes a transaction identifier. Each record in turn in one log file is compared to an advancing window of records in the other log file. A first table contains associations of statements to transactions and transactions to statements for records in the window. If a match is found between a record in the one file and a record in the window, information associating partial transactions in the one file to potential transactions of the records in the window is added to a second table. If an end-of-transaction record is read from the one file, a best match is found between the ended transaction and the potential transactions based on information in the first and second tables.
摘要:
A method for identifying matching transactions between two log files where each transaction includes one or more statements. Each log file record records the execution of a statement and includes a transaction identifier. Each record in turn in one log file is compared to an advancing window of records in the other log file. A first table contains associations of statements to transactions and transactions to statements for records in the window. If a match is found between a record in the one file and a record in the window, information associating partial transactions in the one file to potential transactions of the records in the window is added to a second table. If an end-of-transaction record is read from the one file, a best match is found between the ended transaction and the potential transactions based on information in the first and second tables.
摘要:
A system for receiving a declarative specification including a plurality of stages. Each stage specifies an atomic operation, a data input to the atomic operation, and a data output from the atomic operation. The data input is characterized by a data type. Links between at least two of the stages are generated to create a data integration workflow. The data integration workflow is compiled to generate computer code for execution on a parallel processing platform. The computer code configured to perform at least one of data preparation and data analysis.
摘要:
Methods and arrangements for processing hierarchical data in a map-reduce framework. Hierarchical data is accepted, and a map-reduce job is performed on the hierarchical data. This performing of a map-reduce job includes determining a cost of partitioning the data, determining a cost of redefining the job and thereupon selectively performing at least one step taken from the group consisting of: partitioning the data and redefining the job.
摘要:
Methods and arrangements for processing hierarchical data in a map-reduce framework. Hierarchical data is accepted, and a map-reduce job is performed on the hierarchical data. This performing of a map-reduce job includes determining a cost of partitioning the data, determining a cost of redefining the job and thereupon selectively performing at least one step taken from the group consisting of: partitioning the data and redefining the job.
摘要:
A method for receiving a declarative specification including a plurality of stages. Each stage specifies an atomic operation, a data input to the atomic operation, and a data output from the atomic operation. The data input is characterized by a data type. Links between at least two of the stages are generated to create a data integration workflow. The data integration workflow is compiled to generate computer code for execution on a parallel processing platform. The computer code configured to perform at least one of data preparation and data analysis.
摘要:
Methods and arrangements for processing hierarchical data in a map-reduce framework. Hierarchical data is accepted, and a map-reduce job is performed on the hierarchical data. This performing of a map-reduce job includes determining a cost of partitioning the data, determining a cost of redefining the job and thereupon selectively performing at least one step taken from the group consisting of: partitioning the data and redefining the job.
摘要:
Methods and arrangements for processing hierarchical data in a map-reduce framework. Hierarchical data is accepted, and a map-reduce job is performed on the hierarchical data. This performing of a map-reduce job includes determining a cost of partitioning the data, determining a cost of redefining the job and thereupon selectively performing at least one step taken from the group consisting of: partitioning the data and redefining the job.
摘要:
Arrangements and methods for employing empirical evidence to estimate the performance of applications with very few data samples, in complex environments such as dynamic SDP environments, using one or more effective data-plotting models.