Abstract:
Methods, apparatus, systems and articles of manufacture are disclosed to optimize sparse matrix execution. An example disclosed apparatus includes a context former to identify a matrix function call from a matrix function library, the matrix function call associated with a sparse matrix, a pattern matcher to identify an operational pattern associated with the matrix function call, and a code generator to associate a function data structure with the matrix function call exhibiting the operational pattern, the function data structure stored external to the matrix function library, and facilitate a runtime link between the function data structure and the matrix function call.
Abstract:
Methods, apparatus, systems and articles of manufacture are disclosed to improve FPGA pipeline emulation efficiency on CPUs. An example disclosed apparatus includes a loop detector to identify a register shift loop in field programmable gate array (FPGA) code, an unroller to shift and store pipeline stages in the register shift loop to a temporary unroll array, an intermediate canceller to cancel out intermediate load and store values of the temporary unroll array to retain last shifted values of the pipeline stages, and a propagator to improve emulation efficiency of the FPGA code by generating a scalar loop of the retained last shifted values for a vectorization input.
Abstract:
Methods, apparatus, systems and articles of manufacture are disclosed to optimize sparse matrix execution. An example disclosed apparatus includes a context former to identify a matrix function call from a matrix function library, the matrix function call associated with a sparse matrix, a pattern matcher to identify an operational pattern associated with the matrix function call, and a code generator to associate a function data structure with the matrix function call exhibiting the operational pattern, the function data structure stored external to the matrix function library, and facilitate a runtime link between the function data structure and the matrix function call.
Abstract:
Methods, apparatus, systems and articles of manufacture are disclosed to improve FPGA pipeline emulation efficiency on CPUs. An example disclosed apparatus includes a loop detector to identify a register shift loop in field programmable gate array (FPGA) code, an unroller to shift and store pipeline stages in the register shift loop to a temporary unroll array, an intermediate canceller to cancel out intermediate load and store values of the temporary unroll array to retain last shifted values of the pipeline stages, and a propagator to improve emulation efficiency of the FPGA code by generating a scalar loop of the retained last shifted values for a vectorization input.
Abstract:
Methods, apparatus, systems and articles of manufacture are disclosed to improve FPGA pipeline emulation efficiency on CPUs. An example disclosed apparatus includes a loop detector to identify a register shift loop in field programmable gate array (FPGA) code, an unroller to shift and store pipeline stages in the register shift loop to a temporary unroll array, an intermediate canceller to cancel out intermediate load and store values of the temporary unroll array to retain last shifted values of the pipeline stages, and a propagator to improve emulation efficiency of the FPGA code by generating a scalar loop of the retained last shifted values for a vectorization input.