Abstract:
Provided are techniques for compilation of hierarchical data processing. A data flow diagram including one or more operators, wherein each operator includes at least one of an incoming arc and an outgoing arc, is received. For each operator, for each incoming arc, it is validated that an arc input formal schema is compatible with a schema rooted in a context node in an arc input actual schema, and, for each outgoing arc, an arc output formal schema is computed based on operator logic and operator inputs and an arc output actual schema is computed from the arc input actual schema by replacing the context node of the arc input actual schema with the arc output formal schema.
Abstract:
A system and method for implementing a unified model for integration systems is presented. A user provides inputs to an integrated language engine for placing operator components and arc components onto a dataflow diagram. Operator components include data ports for expressing data flow, and also include meta-ports for expressing control flow. Arc components connect operator components together for data and control information to flow between the operator components. The dataflow diagram is a directed acyclic graph that expresses an application without including artificial boundaries during the application design process. Once the integrated language engine generates the dataflow diagram, the integrated language engine compiles the dataflow diagram to generated application code.
Abstract:
Provided are techniques for pipeline optimization based on polymorphic schema knowledge. A hierarchical document to be processed by a pipeline of transformations is received. It is determined whether a next downstream transformation accesses content of each schema node in an associated input schema, wherein the input schema is a polymorphic schema. In response to determining that the next downstream transformation is not accessing the content of each schema node in the associated input schema, data items corresponding to each schema node that the next downstream transformation is not accessing are collected into a single compressed event. The collected items are passed to the next downstream transformation as the single compressed event.
Abstract:
Provided are techniques for processing data items. A limit on the number of dequeue operations allowed in a current step of processing for a queue-like data structure is set, wherein the number of allowed dequeue operations limit at least one of an amount of CPU resources and an amount of memory resources to be used by an operator. The operator to perform processing is selected and the operator is activated by passing control to the operator, which then dequeues data constrained by the limits set. In response to receiving control back from the operator, the data structure size is examined to determine whether the operator made forward progress in that the operator enqueued or dequeued at least one data item.
Abstract:
Provided are techniques for optimizing the processing of hierarchical data. A linear processing graph is received, wherein the linear processing graph includes a plurality of operators, wherein each operator in the plurality is connected to at least one other operator by an arc, wherein hierarchical data flows on arcs, wherein the operators read and replace identified subregions within the hierarchical data flowing into the operators on the arcs, and wherein the operators do not modify the hierarchical data outside of these identified subregions. For each operator in the linear processing graph, a minimal set of dependent upstream operators on which that operator depends is found by examining how the identified subregions are created in the linear processing graph through obtaining a set of operators on which that operator depends, by analyzing dependencies carried by a set of vector nodes of the hierarchical data in an input schema of the operator, and, for each of the vector nodes, by analyzing an associated set of scalar nodes, wherein finding the minimum set of operators includes taking into consideration data preservation characteristics of the plurality of operators and taking into consideration structural-order preservation characteristics of the plurality of operators. The linear processing graph is rewritten to create a new graph that expresses dependencies based on the minimal set of dependent upstream operators for each operator.
Abstract:
A system and method for implementing a unified model for integration systems is presented. A user provides inputs to an integrated language engine for placing operator components and arc components onto a dataflow diagram. Operator components include data ports for expressing data flow, and also include meta-ports for expressing control flow. Arc components connect operator components together for data and control information to flow between the operator components. The dataflow diagram is a directed acyclic graph that expresses an application without including artificial boundaries during the application design process. Once the integrated language engine generates the dataflow diagram, the integrated language engine compiles the dataflow diagram to generated application code.
Abstract:
Provided are techniques for increasing transaction processing throughput. A transaction item with a message identifier and a session identifier is obtained. The transaction item is added to an earliest aggregated transaction in a list of aggregated transactions in which no other transaction item as the same session identifier. A first aggregated transaction in the list of aggregated transactions that has met execution criteria is executed. In response to determining that the aggregated transaction is not committing, the aggregated transaction is broken up into multiple smaller aggregated transactions and a target size of each aggregated transaction is adjusted based on measurements of system throughput.
Abstract:
Provided are techniques for increasing transaction processing throughput. A transaction item with a message identifier and a session identifier is obtained. The transaction item is added to an earliest aggregated transaction in a list of aggregated transactions in which no other transaction item as the same session identifier. A first aggregated transaction in the list of aggregated transactions that has met execution criteria is executed. In response to determining that the aggregated transaction is not committing, the aggregated transaction is broken up into multiple smaller aggregated transactions and a target size of each aggregated transaction is adjusted based on measurements of system throughput.
Abstract:
Provided are a techniques for determining a frequency distribution for a set of records. A count table of frequency distributions is built in memory for each field in the set of records, wherein each record of each count table includes a field identifier, a field value, and a count of a number of times the field value occurs in the set of records, and wherein the field identifier concatenated with the field value comprises a composite key value. It is determined that at least one count table of frequency distributions is approaching a maximum amount of memory allocated to that count table. The records of the at least one count table that is approaching the maximum amount of memory are sent for sorting and additional counting, wherein the records include composite key values.