Abstract:
An approach and a method for efficient execution of nested map-reduce framework workloads to take advantage of the combined execution of central processing units (CPUs) and graphics processing units (GPUs) and lower latency of data access in accelerated processing units (APUs) is described. In embodiments, metrics are generated to determine whether a map or reduce function is more efficiently processed on a CPU or a GPU. A first metric is based on ratio of a number of branch instructions to a number of non-branch instructions, and a second metric is based on the comparison of execution times on each of the CPU and the GPU. Selecting execution of map and reduce functions based on the first and second metrics result in accelerated computations. Some embodiments include scheduling pipelined executions of functions on the CPU and functions on the GPU concurrently to achieve power-efficient nested map reduce framework execution.
Abstract:
An interface includes a first hardware register field to store respective chunks of a command directed to a device and respective chunks of a response to the command from the device. The interface also includes a second hardware register field to store a size of the command and a size of the response. The first and second hardware register fields are accessible by the device and by a processor external to the device that generates the command, in response to memory not being available to buffer the command and the response.
Abstract:
An interface includes a first hardware register field to store respective chunks of a command directed to a device and respective chunks of a response to the command from the device. The interface also includes a second hardware register field to store a size of the command and a size of the response. The first and second hardware register fields are accessible by the device and by a processor external to the device that generates the command, in response to memory not being available to buffer the command and the response.