Abstract:
A general purpose high-performance distributed execution engine for coarse-grained data-parallel applications is proposed that allows developers to easily create large-scale distributed applications without requiring them to master concurrency techniques beyond being able to draw a graph of the data-dependencies of their algorithms. Based on the graph, a job manager intelligently distributes the work load so that the resources of the execution engine are used efficiently. During runtime, the job manager (or other entity) can automatically modify the graph to improve efficiency. The modifications are based on runtime information, topology of the distributed execution engine, and/or the distributed application represented by the graph.
Abstract:
The techniques discussed herein efficiently perform data-parallel computations on collections of data by implementing a differential dataflow model that performs computations on differences in the collections of data. The techniques discussed herein describe defined operators for use in a data-parallel program that performs the computations on the determined differences between the collections of data by creating a lattice and indexing the differences in the collection of data according to the lattice.
Abstract:
General-purpose distributed data-parallel computing using high-level computing languages is described. Data parallel portions of a sequential program written in a high-level language are automatically translated into a distributed execution plan. Map and reduction computations are automatically added to the plan. Patterns in the sequential program can be automatically identified to trigger map and reduction processing. Direct invocation of map and reduction processing is also provided. One or more portions of the reduce computation are pushed to the map stage and dynamic aggregation is inserted when possible. The system automatically identifies opportunities for partial reductions and aggregation, but also provides a set of extensions in a high-level computing language for the generation and optimization of the distributed execution plan. The extensions include annotations to declare functions suitable for these optimizations.
Abstract:
General-purpose distributed data-parallel computing using high-level computing languages is described. Data parallel portions of a sequential program that is written by a developer in a high-level language are automatically translated into a distributed execution plan. A set of extensions to a sequential high-level computing language are provided to support distributed parallel computations and to facilitate generation and optimization of distributed execution plans. The extensions are fully integrated with the programming language, thereby enabling developers to write sequential language programs using known constructs while providing the ability to invoke the extensions to enable better generation and optimization of the execution plan for a distributed computing environment.
Abstract:
A general purpose high-performance distributed execution engine can be used by developers to deploy large-scale distributed applications. To allow developers to easily make use of the distributed execution engine, a graph building language is proposed that enables developers to efficiently create graphs (e.g., direct acyclic graphs) that describe the subprograms to be executed and the flow of data between them. A job manager (or other appropriate entity) reads the description of the graph created with the graph building language, builds the graph based on that description, and intelligently distributes the subprograms according to the graph so that system resources are used efficiently. In one embodiment, the graph building language (and, thus, the description of the graph) includes syntax for replication, pointwise connect, cross connect and merge.
Abstract:
General-purpose distributed data-parallel computing using high-level computing languages is described. Data parallel portions of a sequential program that is written by a developer in a high-level language are automatically translated into a distributed execution plan. A set of extensions to a sequential high-level computing language are provided to support distributed parallel computations and to facilitate generation and optimization of distributed execution plans. The extensions are fully integrated with the programming language, thereby enabling developers to write sequential language programs using known constructs while providing the ability to invoke the extensions to enable better generation and optimization of the execution plan for a distributed computing environment.
Abstract:
A general purpose high-performance distributed execution engine for coarse-grained data-parallel applications is proposed that allows developers to easily create large-scale distributed applications without requiring them to master concurrency techniques beyond being able to draw a graph of the data-dependencies of their algorithms. Based on the graph, a job manager intelligently distributes the work load so that system resources are used efficiently. The system is designed to scale from a small cluster of a few computers, or the multiple CPU cores on a powerful single computer, up to a data center containing thousands of servers.
Abstract:
General-purpose distributed data-parallel computing using high-level computing languages is described. Data parallel portions of a sequential program written in a high-level language are automatically translated into a distributed execution plan. Map and reduction computations are automatically added to the plan. Patterns in the sequential program can be automatically identified to trigger map and reduction processing. Direct invocation of map and reduction processing is also provided. One or more portions of the reduce computation are pushed to the map stage and dynamic aggregation is inserted when possible. The system automatically identifies opportunities for partial reductions and aggregation, but also provides a set of extensions in a high-level computing language for the generation and optimization of the distributed execution plan. The extensions include annotations to declare functions suitable for these optimizations.
Abstract:
An image may be received, a portion of which corresponds to a surface of an object, such as a book, a CD, a DVD, a wine bottle, etc. The portion of the image that corresponds to the surface of the object is located. The portion of the image is compared with previously stored images of surfaces of objects to identify the object. A record of the object is created and added to a library. The record of the object may comprise the image of the object, the portion of the image which corresponds to the surface of the object, and/or the received image itself. The record may comprise an indicator of a location of the object.
Abstract:
Strong semantics are provided to programs that are correctly synchronized in their use of transactions by using dynamic separation of objects that are accessed in transactions from those accessed outside transactions. At run-time, operations are performed to identify transitions between these protected and unprotected modes of access. Dynamic separation permits a range of hardware-based and software-based implementations which allow non-conflicting transactions to execute and commit in parallel. A run-time checking tool, analogous to a data-race detector, may be provided to test dynamic separation of transacted data and non-transacted data. Dynamic separation may be used in an asynchronous I/O library.