Abstract:
The present invention extends to methods, systems, and computer program products for automatically optimizing memory accesses by kernel functions executing on parallel accelerator processors. A function is accessed. The function is configured to operate over a multi-dimensional matrix of memory cells through invocation as a plurality of threads on a parallel accelerator processor. A layout of the memory cells of the multi-dimensional matrix and a mapping of memory cells to global memory at the parallel accelerator processor are identified. The function is analyzed to identify how each of the threads access the global memory to operate on corresponding memory cells when invoked from the kernel function. Based on the analysis, the function altered to utilize a more efficient memory access scheme when performing accesses to the global memory. The more efficient memory access scheme increases coalesced memory access by the threads when invoked over the multi-dimensional matrix.
Abstract:
Partitioning query execution work of a sequence including a plurality of elements. A method includes a worker core requesting work from a work queue. In response, the worker core receives a task from the work queue. The task is a replicable sequence-processing task including two distinct steps: scheduling a copy of the task on the scheduler queue and processing a sequence. The worker core processes the task by: creating a replica of the task and placing the replica of the task on the work queue, and beginning processing the sequence. The acts are repeated for one or more additional worker cores, where receiving a task from the work queue is performed by receiving one or more replicas of tasks placed on the task queue by earlier performances of creating a replica of the task and placing the replica of the task on the work queue by a different worker core.
Abstract:
Dynamically allocated thread storage in a computing device is disclosed. The dynamically allocated thread storage is configured to work with a process including two or more threads. Each thread includes a statically allocated thread-local slot configured to store a table. Each table is configured to include a table slot corresponding with a dynamically allocated thread-local value. A dynamically allocated thread-local instance corresponds with the table slot.
Abstract:
A method of providing access to a dataset in a type-safe manner includes storing a dataset including a plurality of data elements and a corresponding plurality of order keys for indicating an ordering of the data elements. Each order key is associated with one of the data elements. An interface to the dataset is generated that is parameterized by an element type parameter and a key type parameter. The interface is configured to provide access to the data elements and the order keys in the dataset in a type-safe manner.
Abstract:
Embodiments are directed to implementing custom operators in a query for a parallel query engine and to generating a partitioned representation of a sequence of query operators in a parallel query engine. A computer system receives a portion of partitioned input data at a parallel query engine, where the parallel query engine is configured to process data queries in parallel, and where the queries include a sequence of built-in operators. The computer system incorporates a custom operator into the sequence of built-in operators for a query and accesses the sequence of operators to determine how the partitioned input data is to be processed. The custom operator is accessed in the same manner as the built-in operators. The computer system also processes the sequence of operators including both the built-in operators and at least one custom operator according to the determination indicating how the data is to be processed.
Abstract:
Dynamic data partitioning is disclosed for use with a multiple node processing system that consumes items from a data stream of any length and independent of whether the length is undeclared. Dynamic data partitioning takes items from the data stream when a thread is idle and assigns the taken items to an idle thread, and it varies the size of data chunks taken from the stream and assigned to a thread to efficiently distribute work loads among the nodes. In one example, data chunk sizes taken from the beginning of the data stream are relatively smaller than data chunk sizes taken towards the middle or end of the data stream. Dynamic data partitioning employs a growth function where chunks have a size related to single aligned cache lines and efficiently increases the size of the data chunks to occasionally double the amount of data assigned to concurrent threads.
Abstract:
The present invention extends to methods, systems, and computer program products for partitioning streaming data. Embodiments of the invention can be used to hash partition a stream of data and thus avoids unnecessary memory usage (e.g., associated with buffering). Hash partitioning can be used to split an input sequence (e.g., a data stream) into multiple partitions that can be processed independently. Other embodiments of the invention can be used to hash repartition a plurality of streams of data. Hash repartitioning converts a set of partitions into another set of partitions with the hash partitioned property. Partitioning and repartitioning can be done in a streaming manner at runtime by exchanging values between worker threads responsible for different partitions.
Abstract:
A method of processing a query includes receiving a language integrated query including at least one operator, and operating on an input array with the at least one operator. An output array is generated by the at least one operator based on the operation on the input array.
Abstract:
A parallelism policy object provides a control parallelism interface whose implementation evaluates parallelism conditions that are left unspecified in the interface. User-defined and other parallelism policy procedures can make recommendations to a worker program for transitioning between sequential program execution and parallel execution. Parallelizing assistance values obtained at runtime can be used in the parallelism conditions on which the recommendations are based. A consistent parallelization policy can be employed across a range of parallel constructs, and inside recursive procedures.
Abstract:
A data partitioning interface provides procedure headings to create data partitions for processing data elements in parallel, and for obtaining data elements to process, without specifying the organizational structure of a data partitioning. A data partitioning implementation associated with the data partitioning interface provides operations to implement the interface procedures, and may also provide dynamic partitioning to facilitate load balancing.