Abstract:
A scheduler receives a job graph which includes a graph of computational vertices that are designed to be executed on multiple distributed computer systems. The scheduler queries a graph manager to determine which computational vertices of the job graph are ready for execution in a local execution environment. The scheduler queries a cluster manager to determine the organizational topology of the distributed computer systems to simulate the determined topology in the local execution environment. The scheduler queries a data manager to determine data storage locations for each of the computational vertices indicated as being ready for execution in the local execution environment. The scheduler also indicates to a vertex spawner that an instance of each computational vertex is to be spawned in the local execution environment based on the organizational topology and indicated data storage locations, and indicates to the local execution environment that the spawned vertices are to be executed.
Abstract:
Embodiments are directed to implementing custom operators in a query for a parallel query engine and to generating a partitioned representation of a sequence of query operators in a parallel query engine. A computer system receives a portion of partitioned input data at a parallel query engine, where the parallel query engine is configured to process data queries in parallel, and where the queries include a sequence of built-in operators. The computer system incorporates a custom operator into the sequence of built-in operators for a query and accesses the sequence of operators to determine how the partitioned input data is to be processed. The custom operator is accessed in the same manner as the built-in operators. The computer system also processes the sequence of operators including both the built-in operators and at least one custom operator according to the determination indicating how the data is to be processed.
Abstract:
The present invention extends to methods, systems, and computer program products for partitioning streaming data. Embodiments of the invention can be used to hash partition a stream of data and thus avoids unnecessary memory usage (e.g., associated with buffering). Hash partitioning can be used to split an input sequence (e.g., a data stream) into multiple partitions that can be processed independently. Other embodiments of the invention can be used to hash repartition a plurality of streams of data. Hash repartitioning converts a set of partitions into another set of partitions with the hash partitioned property. Partitioning and repartitioning can be done in a streaming manner at runtime by exchanging values between worker threads responsible for different partitions.
Abstract:
The present invention extends to methods, systems, and computer program products for indicating parallel operations with user-visible events. Event markers can be used to indicate an abstracted outer layer of execution as well as expose internal specifics of parallel processing systems, including systems that provide data parallelism. Event markers can be used to show a variety of execution characteristics including higher-level markers to indicate the beginning and end of an execution program (e.g., a query). Inside the execution program (query) individual fork/join operations can be indicated with sub-levels of markers to expose their operations. Additional decisions made by an execution engine, such as, for example, when elements initially yield, when queries overlap or nest, when the query is cancelled, when the query bails to sequential operation, when premature merging or re-partitioning are needed can also be exposed.
Abstract:
A method of resizing a concurrently accessed hash table is disclosed. The method includes acquiring the locks in the hash table. The hash table, in a first state, is dynamically reconfigured in size into a second state. Additionally, the amount of locks is dynamically adjusted based on comparing the size of the hash table in the second state to the size of the hash table in the second state.
Abstract:
The present invention extends to methods, systems, and computer program products for automatically optimizing memory accesses by kernel functions executing on parallel accelerator processors. A function is accessed. The function is configured to operate over a multi-dimensional matrix of memory cells through invocation as a plurality of threads on a parallel accelerator processor. A layout of the memory cells of the multi-dimensional matrix and a mapping of memory cells to global memory at the parallel accelerator processor are identified. The function is analyzed to identify how each of the threads access the global memory to operate on corresponding memory cells when invoked from the kernel function. Based on the analysis, the function altered to utilize a more efficient memory access scheme when performing accesses to the global memory. The more efficient memory access scheme increases coalesced memory access by the threads when invoked over the multi-dimensional matrix.
Abstract:
Partitioning query execution work of a sequence including a plurality of elements. A method includes a worker core requesting work from a work queue. In response, the worker core receives a task from the work queue. The task is a replicable sequence-processing task including two distinct steps: scheduling a copy of the task on the scheduler queue and processing a sequence. The worker core processes the task by: creating a replica of the task and placing the replica of the task on the work queue, and beginning processing the sequence. The acts are repeated for one or more additional worker cores, where receiving a task from the work queue is performed by receiving one or more replicas of tasks placed on the task queue by earlier performances of creating a replica of the task and placing the replica of the task on the work queue by a different worker core.
Abstract:
Dynamically allocated thread storage in a computing device is disclosed. The dynamically allocated thread storage is configured to work with a process including two or more threads. Each thread includes a statically allocated thread-local slot configured to store a table. Each table is configured to include a table slot corresponding with a dynamically allocated thread-local value. A dynamically allocated thread-local instance corresponds with the table slot.
Abstract:
A method of providing access to a dataset in a type-safe manner includes storing a dataset including a plurality of data elements and a corresponding plurality of order keys for indicating an ordering of the data elements. Each order key is associated with one of the data elements. An interface to the dataset is generated that is parameterized by an element type parameter and a key type parameter. The interface is configured to provide access to the data elements and the order keys in the dataset in a type-safe manner.
Abstract:
Embodiments are directed to implementing custom operators in a query for a parallel query engine and to generating a partitioned representation of a sequence of query operators in a parallel query engine. A computer system receives a portion of partitioned input data at a parallel query engine, where the parallel query engine is configured to process data queries in parallel, and where the queries include a sequence of built-in operators. The computer system incorporates a custom operator into the sequence of built-in operators for a query and accesses the sequence of operators to determine how the partitioned input data is to be processed. The custom operator is accessed in the same manner as the built-in operators. The computer system also processes the sequence of operators including both the built-in operators and at least one custom operator according to the determination indicating how the data is to be processed.