摘要:
A system and method of automatically configuring memory in a data processing system, including the steps of: receiving source code containing a loop nest, wherein the loop nest includes data arrays with affine indexes; optimizing source code by relocating elements from a first array in memory to a second array in memory; and executing the optimized source code.
摘要:
Methods and apparatus are described for optimizing interconnections between busses and function units and registers. The method includes identifying each bus in a plurality of busses and at least one hardware component to which each bus is assigned for a given operation. At least two bus assignments are identified for which different operations occur on the same hardware component. Hardware components are assigned for different operations occurring on the same hardware component to the same bus. The optimization process can be efficiently carried out using conventional algorithms for solving assignment problems. Use of these assignment problem algorithms provides an efficient and reliable way of optimizing the bus assignments.
摘要:
A method includes selecting a first biclique role in a plurality of roles and finding all roles in the plurality that have a set of vertices of a second type that is a subset of a set of vertices of the second type in the first role; removing each of the subsets from the set of vertices of the second type corresponding to the first role; and reassigning the vertices of the first type to the roles such that original associations between the vertices of the first type and the vertices of the second type are maintained.
摘要:
Various embodiments of the present invention are directed multi-core memory modules. In one embodiment, a memory module (500) includes memory chips, and a demultiplexer register (502) electronically connected to each of the memory chips and a memory controller. The memory controller groups one or more of the memory chips into at least one virtual memory device in accordance with changing performance and/or energy efficiency needs. The demultiplexer register (502) is configured to receive a command indentifying one of the virtual memory devices and send the command to the memory chips of the identified virtual memory device. In certain embodiments, the memory chips can be dynamic random access memory chips.
摘要:
The present invention provides a system with a cache that indicates which, if any, of its sections contain data having spent status. The invention also provides a method for identifying cache sections containing data having spent status and then purging without writing back to main memory a cache line having at least one section containing data having spent status. The invention further provides a program that specifies a cache-line section containing data that is to acquire “spent” status. “Spent” data, herein, is useless modified or unmodified data that was formerly at least potentially useful data when it was written to a cache. “Purging” encompasses both invalidating and overwriting.
摘要:
A method includes providing a bipartite graph having vertices of a first type, vertices of a second type, and a plurality of edges, wherein each edge joins a vertex of the first type with a vertex of the second type. A unipartite edge dual graph is generated from the bipartite graph, and a minimum clique partition of the edge dual graph is recursively determined. A biclique is then created in the bipartite graph corresponding to each clique in the minimum clique partition of the edge dual graph.
摘要:
When a data-parallel language like Fortran 90 is compiled for a distributed-memory machine, aggregate data objects (such as arrays) are distributed across the processor memories. The mapping determines the amount of residual communication needed to bring operands of parallel operations into alignment with each other. A common approach is to break the mapping into two stages: first, an alignment that maps all the objects to an abstract template, and then a distribution that maps the template to the processors. This disclosure deals with two facets of the problem of finding alignments that reduce residual communication; namely, alignments that vary in loops, and objects that permit of replicated alignments. It is shown that loop-dependent dynamic alignment is sometimes necessary for optimum performance, and algorithms are provided so that a compiler can determine good dynamic alignments for objects within "do" loops. Also situations are identified in which replicated alignment is either required by the program itself (via spread operations) or can be used to improve performance. An algorithm based on network flow is proposed for determing which objects to replicate so as to minimize the total amount of broadcast communication in replication.
摘要:
Various embodiments of the present invention are directed to methods that enable a memory controller to choose a particular operation mode for virtual memory devices of a memory module based on dynamic program behavior. In one embodiment, a method for determining an operation mode for each virtual memory device of a memory module includes selecting a metric (1001) that provides a standard by which performance and/or energy efficiency of the memory module is optimized during execution of one or more applications on a multicore processor. For each virtual memory device (1005), the method also includes collecting usage information (1006) associated with the virtual memory device over a period of time, determining an operation mode (1007) for the virtual memory device based on the metric and usage information, and entering the virtual memory device into the operation mode (1103, 1105, 1107, 1108).
摘要:
A method of predicting user-item ratings includes providing a first matrix of hidden variables associated with individual items, a second matrix of hidden variables associated with individual users, a third matrix of predicted user-item ratings derived from an inner product of vectors in the first and second matrices, and a fourth matrix of actual user-item ratings. The first and second matrices are alternately fixed and solved with a weighted-λ regularization of at least one of the first and second matrices by minimizing a sum of squared errors between actual user-item ratings in the fourth matrix and corresponding predicted user-item ratings in the third matrix repeatedly until a stopping criterion is satisfied.
摘要:
A method of predicting user-item ratings includes providing a first matrix of hidden variables associated with individual items, a second matrix of hidden variables associated with individual users, a third matrix of predicted user-item ratings derived from an inner product of vectors in the first and second matrices, and a fourth matrix of actual user-item ratings. The first and second matrices are alternately fixed and solved with a weighted-λ regularization of at least one of the first and second matrices by minimizing a sum of squared errors between actual user-item ratings in the fourth matrix and corresponding predicted user-item ratings in the third matrix repeatedly until a stopping criterion is satisfied.