NEURAL NETWORK TRAINING UNDER MEMORY RESTRAINT
    181.
    发明公开

    公开(公告)号:US20230196113A1

    公开(公告)日:2023-06-22

    申请号:US18112036

    申请日:2023-02-21

    CPC classification number: G06N3/084 G06N3/04

    Abstract: Methods and systems for training a neural network are provided. In one example, an apparatus comprises a memory that stores instructions; and a hardware processor configured to execute the instructions to: control a neural network processor to perform a loss gradient operation to generate data gradients; after the loss gradient operation completes, control the neural network processor to perform a forward propagation operation to generate intermediate outputs; control the neural network processor to perform a backward propagation operation based on the data gradients and the intermediate outputs to generate weight gradients; receive the weight gradients from the neural network processor; and update weights of a neural network based on the weight gradients.

    DYNAMIC PROCESSING ELEMENT ARRAY EXPANSION
    182.
    发明公开

    公开(公告)号:US20230153620A1

    公开(公告)日:2023-05-18

    申请号:US18154576

    申请日:2023-01-13

    CPC classification number: G06N3/08 G06N3/04

    Abstract: A computer-implemented method includes receiving a neural network model that includes a tensor operation, dividing the tensor operation into a set of sub-operations, and generating instructions for performing a plurality of sub-operations of the set of sub-operations on respective computing engines of a plurality of computing engines on a same integrated circuit device or on different integrated circuit devices. Each sub-operation of the set of sub-operations generates a portion of a final output of the tensor operation. An inference is made based on a result of a sub-operation of the plurality of sub-operations, or based on results of the plurality of sub-operations.

    Scheduling for locality of reference to memory

    公开(公告)号:US11625269B1

    公开(公告)日:2023-04-11

    申请号:US17301343

    申请日:2021-03-31

    Abstract: A technique for scheduling instructions includes obtaining a set of instructions that operate on memory objects, and determining the dependencies of the memory objects. The memory objects are then sorted into a sequence of memory objects based on the dependencies of the memory objects, and the set of instructions are scheduled into a sequence of instructions according to the sequence of memory objects. Sorting memory objects allows instructions that operate on the same memory object to be kept together. This helps minimize spilling conditions because intervening instructions that do not operate on the same memory object can be avoided.

    Neural network training under memory restraint

    公开(公告)号:US11610128B2

    公开(公告)日:2023-03-21

    申请号:US16836421

    申请日:2020-03-31

    Abstract: Methods and systems for training a neural network are provided. In one example, an apparatus comprises a memory that stores instructions; and a hardware processor configured to execute the instructions to: control a neural network processor to perform a loss gradient operation to generate data gradients; after the loss gradient operation completes, control the neural network processor to perform a forward propagation operation to generate intermediate outputs; control the neural network processor to perform a backward propagation operation based on the data gradients and the intermediate outputs to generate weight gradients; receive the weight gradients from the neural network processor; and update weights of a neural network based on the weight gradients.

    Neural network operation reordering for parallel execution

    公开(公告)号:US11567778B2

    公开(公告)日:2023-01-31

    申请号:US17243415

    申请日:2021-04-28

    Abstract: Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.

    PROCESSING FOR MULTIPLE INPUT DATA SETS

    公开(公告)号:US20230014783A1

    公开(公告)日:2023-01-19

    申请号:US17951084

    申请日:2022-09-22

    Abstract: Disclosed herein are techniques for performing multi-layer neural network processing for multiple contexts. In one embodiment, a computing engine is set in a first configuration to implement a second layer of a neural network and to process first data related to a first context to generate first context second layer output. The computing engine can be switched from the first configuration to a second configuration to implement a first layer of the neural network. The computing engine can be used to process second data related to a second context to generate second context first layer output. The computing engine can be set to a third configuration to implement a third layer of the neural network to process the first context second layer output and the second context first layer output to generate a first processing result of the first context and a second processing result of the second context.

    AUTO-DETECTION OF INTERCONNECT HANGS IN INTEGRATED CIRCUITS

    公开(公告)号:US20220413980A1

    公开(公告)日:2022-12-29

    申请号:US17896739

    申请日:2022-08-26

    Abstract: A self-detection mechanism for an IC is disclosed that determines whether the IC's internal bus is in a hanging state. An initialization sequence can be modified after a soft reset by reading data from an internal DRAM of the IC using a Direct Memory Access (DMA) controller as part of the initialization sequence. The read command is issued over the internal bus and, if the bus is hanging, the read command is not completed. Monitoring can be performed by waiting a predetermined period of time (e.g., 100ms) to determine if the read was properly completed. If so, no further action is needed. If the read was not completed, then a hard reset is requested to be performed. Thus, an initialization sequence can be modified to run dummy transactions through the internal bus, and validate that all paths are functional.

    Data replication for accelerator
    189.
    发明授权

    公开(公告)号:US11500802B1

    公开(公告)日:2022-11-15

    申请号:US17301344

    申请日:2021-03-31

    Abstract: A direct memory access (DMA) engine can be used to multicast data from system memory to a target memory for loading into an array. The DMA engine may include a controller that is configured to receive a data transfer request, and generate a set of write operations for the output interface. The set of write operations can include, for each of multiple partitions of the target memory, a write operation to write usable data from the multicast data to an address offset in the corresponding partition, and an additional write operation to write filler data from the multicast data to a null device address.

    Auto-detection of interconnect hangs in integrated circuits

    公开(公告)号:US11429503B1

    公开(公告)日:2022-08-30

    申请号:US16456902

    申请日:2019-06-28

    Abstract: A self-detection mechanism for an IC is disclosed that determines whether the IC's internal bus is in a hanging state. An initialization sequence can be modified after a soft reset by reading data from an internal DRAM of the IC using a Direct Memory Access (DMA) controller as part of the initialization sequence. The read command is issued over the internal bus and, if the bus is hanging, the read command is not completed. Monitoring can be performed by waiting a predetermined period of time (e.g., 100 ms) to determine if the read was properly completed. If so, no further action is needed. If the read was not completed, then a hard reset is requested to be performed. Thus, an initialization sequence can be modified to run dummy transactions through the internal bus, and validate that all paths are functional.

Patent Agency Ranking