MULTI-DIE DOT-PRODUCT ENGINE TO PROVISION LARGE SCALE MACHINE LEARNING INFERENCE APPLICATIONS

    公开(公告)号:US20240211212A1

    公开(公告)日:2024-06-27

    申请号:US18601259

    申请日:2024-03-11

    CPC classification number: G06F7/5443 G06F9/3867 G06F9/522 G06F40/20 G06N3/063

    Abstract: Systems and methods are provided for a multi-die dot-product engine (DPE) to provision large-scale machine learning inference applications. The multi-die DPE leverages a multi-chip architecture. For example, a multi-chip interface can include a plurality of DPE chips, where each DPE chip performs inference computations for performing deep learning operations. A hardware interface between a memory of a host computer and the plurality of DPE chips communicatively connects the plurality of DPE chips to the memory of the host computer system during an inference operation such that the deep learning operations are spanned across the plurality of DPE chips. Due to the multi-die architecture, multiple silicon devices are allowed to be used for inference, thereby enabling power-efficient inference for large-scale machine learning applications and complex deep neural networks. The multi-die DPE can be used to build a multi-device DNN inference system performing specific applications, such as object recognition, with high accuracy.

    Adjustable Precision for Multi-Stage Compute Processes

    公开(公告)号:US20200042287A1

    公开(公告)日:2020-02-06

    申请号:US16052218

    申请日:2018-08-01

    Abstract: Disclosed techniques provide for dynamically changing precision of a multi-stage compute process. For example, changing neural network (NN) parameters on a per-layer basis depending on properties of incoming data streams and per-layer performance of an NN among other considerations. NNs include multiple layers that may each be calculated with a different degree of accuracy and therefore, compute resource overhead (e.g., memory, processor resources, etc.). NNs are usually trained with 32-bit or 16-bit floating-point numbers. Once trained, an NN may be deployed in production. One approach to reduce compute overhead is to reduce parameter precision of NNs to 16 or 8 for deployment. The conversion to an acceptable lower precision is usually determined manually before deployment and precision levels are fixed while deployed. Disclosed techniques and implementations address automatic rather than manual determination or precision levels for different stages and dynamically adjusting precision for each stage at run-time.

Patent Agency Ranking