Systolic array accelerator systems and methods

    公开(公告)号:US11003619B2

    公开(公告)日:2021-05-11

    申请号:US16283795

    申请日:2019-02-24

    Abstract: The present disclosure is directed to systems and methods for decomposing systolic array circuitry to provide a plurality of N×N systolic sub-array circuits, apportioning a first tensor or array into a plurality of N×M first input arrays, and apportioning a second tensor or array into a plurality of M×N second input arrays. Systolic array control circuitry transfers corresponding ones of the first input arrays and second input arrays to a respective one of the plurality of N×N systolic sub-array circuits. As the elements included in the first input array and the elements included in the second input array are transferred to the systolic sub-array, the systolic sub-array performs one or more mathematical operations using the first and the second input arrays. The systems and methods beneficially improve the usage of the systolic array circuitry thereby advantageously reducing the number of clock cycles needed to perform a given number of calculations.

    Technologies for dynamic acceleration of general-purpose code using binary translation targeted to hardware accelerators with runtime execution offload

    公开(公告)号:US10740152B2

    公开(公告)日:2020-08-11

    申请号:US15370634

    申请日:2016-12-06

    Abstract: Technologies for dynamic acceleration of general-purpose code include a computing device having a general-purpose processor core and one or more hardware accelerators. The computing device identifies an acceleration candidate in an application that is targeted to the processor core. The acceleration candidate may be a long-running computation of the application. The computing device translates the acceleration candidate into a translated executable targeted to the hardware accelerator. The computing device determines whether to offload execution of the acceleration candidate and, if so, executes the translated executable with the hardware accelerator. The computing device may translate the acceleration candidate into multiple translated executables, each targeted to a different hardware accelerator. The computing device may select among the translated executables in response to determining to offload execution. The hardware accelerators may include, for example, a processor graphics, an image signal processor, or a field-programmable gate array. Other embodiments are described and claimed.

    Analyzing potential benefits of vectorization
    17.
    发明授权
    Analyzing potential benefits of vectorization 有权
    分析矢量化的潜在优势

    公开(公告)号:US09170789B2

    公开(公告)日:2015-10-27

    申请号:US13997140

    申请日:2013-03-05

    CPC classification number: G06F8/41 G06F8/456

    Abstract: Embodiments of computer-implemented methods, systems, computing devices, and computer-readable media (transitory and non-transitory) are described herein for analyzing execution of a plurality of executable instructions and, based on the analysis, providing an indication of a benefit to be obtained by vectorization of at least a subset of the plurality of executable instructions. In various embodiments, the analysis may include identification of the subset of the plurality of executable instructions suitable for conversion to one or more single-instruction multiple-data (“SIMD”) instructions.

    Abstract translation: 本文描述了计算机实现的方法,系统,计算设备和计算机可读介质(暂时性和非暂时性)的实施例,用于分析多个可执行指令的执行,并且基于该分析,提供对 可以通过对多个可执行指令的至少一个子集进行向量化来获得。 在各种实施例中,分析可以包括识别适合于转换成一个或多个单指令多数据(“SIMD”)指令的多个可执行指令的子集。

Patent Agency Ranking