摘要:
Disclosed herein are techniques for selecting execution environments. Each operation in a sequence of operations is implemented using a selected execution environment. Each operation is converted into code executable in the selected execution environment. If some operations in the sequence were implemented in different execution environments, execution of the operations is coordinated.
摘要:
A method for quality objective-based ETL pipeline optimization is provided. An improvement objective is obtained from user input into a computing system. The improvement objective represents a priority optimization desired by a user for improved ETL flows for an application designed to run in memory of the computing system. An ETL flow is created in the memory of the computing system. The ETL flow is restructured for flow optimization with a processor of the computing system. The flow restructuring is based on the improvement objective. Flow restructuring can include application of flow rewriting optimization or application of an algebraic rewriting optimization. The optimized ETL flow is stored as executable code on a computer readable storage medium.
摘要:
A method of producing a representation of the progress of a process being performed on a database may be embodied in a data processing system. The method may include obtaining for each of a plurality of subprocesses included in the database process an estimated rate of using a system resource during execution of the subprocess and an estimated volume of data to be processed. The actual rate of using the first system resource and the actual volume of data processed during execution of the at least one subprocess may be determined for at least one of the plurality of subprocesses. An output signal may be generated that is representative of the estimated and actual rates and the estimated and actual volumes of data for the at least one subprocess.
摘要:
A computer cluster with objectives-based resource sharing. The cluster includes cloud nodes each with one or more resources, a terminal, data storage, and an allocation node to monitor cloud node resources, provide information descriptive of the cloud node resources to a customer through the terminal, receive a reservation for cloud node resources from the customer, store the reservation in the data storage, determine assignments of the cloud node resources for the reservation and any other pending reservations according to one or more objectives, and allocate the cloud node resources to customers according to the resource assignments.
摘要:
A system, method, and non-transitory computer readable medium are provided to access a graph comprising a plurality of nodes and at least one edge. Each node is associated with at least one database operation. Computer code is constructed that corresponds to the graph in accordance with a nesting level. The nesting level represents a degree of temporary storage to be allocated for intermediate output produced by the at least one database operation.
摘要:
A computer implemented method and apparatus calculate a freshness cost for each of a plurality of information integration flow graphs and select one of the plurality of information integration flow graphs based upon the calculated freshness cost.
摘要:
A computer implemented method and apparatus display an information integration flow graph, receive user input selecting a modification to apply to the displayed information integration flow graph and modify the information integration flow graph based on the selected modification to form a modified information integration flow graph, wherein the modified information integration flow graph is displayed.
摘要:
A system and method is disclosed for determining intervals of a space filling curve in a query box. The method includes the operation of providing a range query-box contained within a data set, wherein the data set has a plurality of elements in N dimensions. A space filling curve is applied to the data set. The space filling curve contacts each of the elements in the N dimensions. The space filling curve is also applied to a range-query box contained within the data set. An entry point of the space filling curve into the query box is determined. A first endpoint box is formed to cover an hquad of the space filling curve at the entry point that includes P×P elements, with a first value of P selected as one. The value of P is increased to expand the endpoint box around a next larger hquad of the space filling curve, until a size of the endpoint box is maximized without exiting the range-query box. The interval of the space filling curve in the endpoint box can then be determined.
摘要:
Computer-based methods, computer-readable storage media and computer systems are provided for optimizing integration flow plans. An initial integration flow plan, one or more objectives and/or an objective function related to the one or more objectives may be received as input. A computing cost of the initial integration flow plan may be compared with the objective function. Using one or more heuristics, a set of close-to-optimal integration flow plans may be identified from all possible integration flow plans that are functionally equivalent to the initial integration flow plan. A close-to-optimal integration flow plan with a lowest computing cost may be selected from the set as a replacement for the initial integration flow plan.
摘要:
A search method includes the step of creating a list of candidate probe words. For each candidate probe word, the number of item descriptions that contain the candidate probe word is counted. Q probe words are chosen whose word count most equally divides the number of remaining item descriptions into q+1 subgroups. The q probe words are presented for selection. Based on the selection, the list of probe words is pruned to eliminate items that that were not selected. The counting step, choosing step, presenting step and pruning step are repeated until a final list of items remain.