Hardware circuit for deep learning task scheduling

    公开(公告)号:US11983566B2

    公开(公告)日:2024-05-14

    申请号:US17374361

    申请日:2021-07-13

    CPC classification number: G06F9/5027 G06F9/4881 G06N3/063

    Abstract: Apparatuses, systems, and techniques for scheduling deep learning tasks in hardware are described. One accelerator circuit includes multiple fixed-function circuits that each processes a different layer type of a neural network. A scheduler circuit receives state information associated with a respective layer being processed by a respective fixed-function circuit and dependency information that indicates a layer dependency condition for the respective layer. The scheduler circuit determines that the layer dependency condition is satisfied using the state information and the dependency information and enables the fixed-function circuit to process the current layer at the respective fixed-function circuit.

    HARDWARE CIRCUIT FOR DEEP LEARNING TASK SCHEDULING

    公开(公告)号:US20220382592A1

    公开(公告)日:2022-12-01

    申请号:US17374361

    申请日:2021-07-13

    Abstract: Apparatuses, systems, and techniques for scheduling deep learning tasks in hardware are described. One accelerator circuit includes multiple fixed-function circuits that each processes a different layer type of a neural network. A scheduler circuit receives state information associated with a respective layer being processed by a respective fixed-function circuit and dependency information that indicates a layer dependency condition for the respective layer. The scheduler circuit determines that the layer dependency condition is satisfied using the state information and the dependency information and enables the fixed-function circuit to process the current layer at the respective fixed-function circuit.

    MEMORY MANAGEMENT FOR OVERLAP DATA BETWEEN TILES OF NEURAL NETWORKS

    公开(公告)号:US20220413752A1

    公开(公告)日:2022-12-29

    申请号:US17446257

    申请日:2021-08-27

    Abstract: Techniques for providing an overlap data buffer to store portions of tiles between passes of chained layers of a neural network are described. One accelerator circuit includes one or more processing units to execute instructions corresponding to the chained layers in multiple passes. In a first pass, the processing unit(s) receives a first input tile of an input feature map from a primary buffer and performs a first operation on the first input tile to obtain a first output tile. The processing unit stores the first output tile in the primary buffer and identifies a portion of the first output tile as corresponding to overlap data between tiles of the input feature map. The processing unit stores the portion in a secondary buffer. In a second pass, the processing unit retrieves the portion to avoid fetching the portion that overlaps and computing the overlap data again.

Patent Agency Ranking