摘要:
A method (and system) for detecting at least one faulty object in a system including a plurality of objects in communication with each other in an n-dimensional architecture, includes probing a first plane of objects in the n-dimensional architecture and probing at least one other plane of objects in the n-dimensional architecture which would result in identifying a faulty object in the system.
摘要:
A method (and structure) of managing memory in which a low-level mechanism is executed to signal, in a sequence of instructions generated at a higher level, that at least a portion of a contiguous area of memory is permitted to be overwritten.
摘要:
A method (and structure) of processing, on a computer having a plurality of processors, includes executing a set of tasks that includes a computational bottleneck in a repetitive procedure on a first subset of the plurality of processors. A set of non-bottleneck tasks of the repetitive procedure is executed on a second subset of the plurality of processors. In a steady-state processing of the repetitive procedure, the first subset of processors and the second subset of processors are together processing the repetitive procedure in a manner such that the first subset of processors and the second subset of processors are each operating substantially full-time.
摘要:
A method (and structure) for executing linear algebra subroutines includes, for an execution code controlling an operation of a floating point unit (FPU) performing a linear algebra subroutine execution, unrolling instructions to prefetch data into a cache providing data into the FPU. The unrolling causes the instructions to touch data anticipated for the linear algebra subroutine execution.
摘要:
A computerized method (and structure) of linear algebra processing on a computer having a plurality of processors for parallel processing, includes, for a matrix having elements originally stored in a memory in a rectangular matrix AR or especially of one of a triangular matrix AT format and a symmetric matrix AS format, distributing data of the rectangular AR or triangular or symmetric matrix (AT, AS) from the memory to the plurality of processors in such a manner that keeps all submatrices of AR or substantially only essential data of the triangular matrix AT or symmetric matrix AS is represented in the distributed memories of the processors as contiguous atomic units for the processing. The linear algebra processing done on the processors with distributed memories requires that submatrices be sent and received as contiguous atomic units based on the prescribed block cyclic data layouts of the linear algebra processing. This computerized method (and structure) defines all of its submatrices as these contiguous atomic units, thereby avoiding extra data preparation before each send and after each receive. The essential data or AT or AS is that data of the triangular or symmetric matrix that is minimally necessary for maintaining the full information content of the triangular AT or symmetric matrix AS.
摘要:
A method and structure for executing a matrix algorithm requiring an order of N3 operations including data reformatting operations, where N is a dimension of an operand of said algorithm on a computer, includes initially reformatting data for at least one matrix used in the matrix algorithm into a data structure stored in a memory, such that stride one data is presented for all submatrices used as operands involved in the matrix algorithm in a format required by the matrix algorithm with substantially no further data re-formatting beyond an order N data re-formatting required for executing the algorithm.
摘要:
A system for (and method of) algorithmic cache-bypass which includes acting on at least one level of cache to at least one of bypass the at least one level of cache, stream through the at least one level of cache, force utilization of at least one other level of cache, bypass at least one level of cache, bypass all levels of cache, force utilization of a main memory, and force utilization of an out-of core memory.
摘要:
A method (and structure) for executing linear algebra subroutines on a computer, including selecting a matrix subroutine from among a plurality of matrix subroutines that performs the matrix multiplication.
摘要:
A method (and structure) of executing a matrix operation, includes, for a matrix A, separating the matrix A into blocks, each block having a size p-by-q. The blocks of size p-by-q are then stored in a cache or memory in at least one of the two following ways. The elements in at least one of the blocks is stored in a format in which elements of the block occupy a location different from an original location in the block, and/or the blocks of size p-by-q are stored in a format in which at least one block occupies a position different relative to its original position in the matrix A.
摘要:
A method (and structure) of linear algebra processing, includes processing a (real or complex) matrix data having elements originally stored in one of a triangular format and a symmetric matrix format in a subroutine designed to process matrix data in a full format. The processing uses a hybrid full packed data structure, which provides a rectangular space characteristic of the full format. The rectangular space is defined by a leading dimension (LD). Inside of the rectangular space are stored a plurality of entities that include all elements of the matrix data originally stored in the triangular or symmetric format.