-
公开(公告)号:US11080131B2
公开(公告)日:2021-08-03
申请号:US16010716
申请日:2018-06-18
发明人: Oded Schwartz , Noam Birnbaum
摘要: A computer implemented method for performing fault tolerant numerical linear algebra computation task consisting of calculation steps that include at least classic or fast matrix multiplication, according to which, a controller splits the task among P processors, which operate in parallel. Additional processors are assigned according to execution and resources parameters, which are also used to select a slice-coded recovery algorithm or a posterior-recovery algorithm for executing the task. Pipelined-reduce operations are used to generate error correcting codes to protect the input blocks and outer products from faults. Upon detecting faults in one or more processors, if the slice-coded recovery algorithm has been selected, a slice-coded recovery algorithm is executed to recover lost input blocks and outer products that. If the posterior-recovery algorithm has been selected, error correcting codes are used for recovering lost input blocks and after the last step, recalculating outer products that correspond to faulty processors. In case when fast multiplication is needed, l DFS down-recursion steps are iteratively performed by the P processors and by the additional processors r times, for which the error correction codes will be valid, and after r times, recalculating the error correction codes for the next r times. Then by each processor of the P processors performs local block multiplication between a pair of blocks, while recalculating a new error correction code. Then the output matrix is created by iteratively performing d BFS up-recursion decoding steps on the multiplication product r times, the error correction codes will be valid only for the r times and after each group of r times, recalculating the error correction codes for the next r times, while at the end all iterations, blocks to be decoded obtaining and a code block that is held by the additional code processors, such that each processor holds a pair of blocks. Upon detecting faults in one or more processors, a recovery algorithm is executed, for recovering lost input blocks and multiplication results that correspond to faulty processors or correcting miscalculations of the processors by recalculation.
-
公开(公告)号:US10387534B2
公开(公告)日:2019-08-20
申请号:US15823776
申请日:2017-11-28
发明人: Oded Schwartz , Elaye Karstadt
摘要: A computerized method comprising operating one or more hardware processor for receiving a first matrix and a second matrix. The hardware processor(s) are operated for determining a basis transformation, wherein the basis transformation is invertible to an inverted basis transformation. The hardware processor(s) are operated for computing an alternative basis first matrix by multiplying the first matrix by the basis transformation. The hardware processor(s) are operated for computing an alternative basis second matrix by multiplying the second matrix by the basis transformation. The hardware processor(s) are operated for performing a matrix multiplication of the alternative basis first matrix and the alternative basis second matrix, thereby producing an alternative basis multiplied matrix. The hardware processor(s) are operated for computing a multiplied matrix by multiplying the alternative basis multiplied matrix by the inverted basis transformation.
-
3.
公开(公告)号:US20180365099A1
公开(公告)日:2018-12-20
申请号:US16010716
申请日:2018-06-18
发明人: Oded Schwartz , Noam Birnbaum
摘要: A computer implemented method for performing fault tolerant numerical linear algebra computation task consisting of calculation steps that include at least classic or fast matrix multiplication, according to which, a controller splits the task among P processors, which operate in parallel. Additional processors are assigned according to execution and resources parameters, which are also used to select a slice-coded recovery algorithm or a posterior-recovery algorithm for executing the task. Pipelined-reduce operations are used to generate error correcting codes to protect the input blocks and outer products from faults. Upon detecting faults in one or more processors, if the slice-coded recovery algorithm has been selected, a slice-coded recovery algorithm is executed to recover lost input blocks and outer products that. If the posterior-recovery algorithm has been selected, error correcting codes are used for recovering lost input blocks and after the last step, recalculating outer products that correspond to faulty processors. In case when fast multiplication is needed, I DFS down-recursion steps are iteratively performed by the P processors and by the additional processors r times, for which the error correction codes will be valid, and after r times, recalculating the error correction codes for the next r times. Then by each processor of the P processors performs local block multiplication between a pair of blocks, while recalculating a new error correction code. Then the output matrix is created by iteratively performing d BFS up-recursion decoding steps on the multiplication product r times, the error correction codes will be valid only for the r times and after each group of r times, recalculating the error correction codes for the next r times, while at the end all iterations, blocks to be decoded obtaining and a code block that is held by the additional code processors, such that each processor holds a pair of blocks. Upon detecting faults in one or more processors, a recovery algorithm is executed, for recovering lost input blocks and multiplication results that correspond to faulty processors or correcting miscalculations of the processors by recalculation.
-
-