Neural network processing based on subgraph recognition

    公开(公告)号:US11714992B1

    公开(公告)日:2023-08-01

    申请号:US16219760

    申请日:2018-12-13

    CPC classification number: G06N3/04 G06F9/4881 G06F9/30003 G06F16/9024

    Abstract: Systems and methods for providing executable instructions to a neural network processor are provided. In one example, a system comprises a database that stores a plurality of executable instructions and a plurality of subgraph identifiers, each subgraph identifier of the plurality of subgraph identifiers being associated with a subset of instructions of the plurality of executable instructions. The system further includes a compiler configured to: identify a computational subgraph from a computational graph of a neural network model; compute a subgraph identifier for the computational subgraph, based on whether the subgraph identifier is included in the plurality of subgraph identifiers, either: obtain, from the database, first instructions associated with the subgraph identifier; or generate second instructions representing the computational subgraph; and provide the first instructions or the second instructions for execution by a neural network processor to perform computation operations for the neural network model.

    Using shared data bus to support systolic array tiling

    公开(公告)号:US11625453B1

    公开(公告)日:2023-04-11

    申请号:US16712699

    申请日:2019-12-12

    Abstract: To improve utilization of a systolic array, each row of the array is provided with a number of general purpose row input data buses. Each of the general purpose row input data buses can be operable to transfer either feature map (FMAP) input elements or weight values into the processing elements of the corresponding row of the array. By using such general purpose row input data buses, concurrent matrix multiplications as well as faster background weight loading can be achieved in the array.

    SYSTOLIC ARRAY WITH EFFICIENT INPUT REDUCTION AND EXTENDED ARRAY PERFORMANCE

    公开(公告)号:US20230004384A1

    公开(公告)日:2023-01-05

    申请号:US17363894

    申请日:2021-06-30

    Abstract: Systems and methods are provided to perform multiply-accumulate operations of reduced precision numbers in a systolic array. Each row of the systolic array can receive reduced inputs from a respective reducer. The reduced input can include a reduced input data element and/or a reduced weight. The systolic array may lack support for inputs with a first bit-length and the reducers may reduce the bit-length of a given input from the first bit-length to a second shorter bit-length and provide the reduced input to the array. In order to reduce the bit-length, the reducer may reduce the number of trailing bits of the input. Further, the systolic array can receive a reduced and rounded input. The systolic array can propagate the reduced input through the processing elements in the systolic array. Each processing element may include a multiplier and/or an adder to perform arithmetical operations based on the reduced input.

    Hardware security accelerator
    55.
    发明授权

    公开(公告)号:US11483296B1

    公开(公告)日:2022-10-25

    申请号:US16917367

    申请日:2020-06-30

    Abstract: A hardware security accelerator includes a configurable parser that is configured to receive a packet and to extract from the packet headers associated with a set of protocols. The security accelerator also includes a packet type detection unit to determine a type of the packet in response to the set of protocols and to generate a packet type identifier indicative of the type of the packet. A configurable security unit includes a configuration unit and a configurable security engine. The configuration unit configures the configurable security engine according to the type of the packet and to content of at least one of the headers extracted from the packet. The configurable security engine performs security processing of the packet to provide at least one security result.

    Synchronizing operations in hardware accelerator

    公开(公告)号:US11468304B1

    公开(公告)日:2022-10-11

    申请号:US16696377

    申请日:2019-11-26

    Inventor: Ron Diamant

    Abstract: In one example, a hardware accelerator comprises an event register that stores an event; a hardware execution engine; and a controller configured to: extract, from an instruction, parameters of an operation to be performed by the hardware execution engine, and a synchronization primitive of a plurality of synchronization primitives for the event; and based on the synchronization primitive, perform at least one of: controlling a start time of the operation at the hardware execution engine, or determining whether to access the event register. The synchronization primitives include a set operation to set the event and/or a wait operation to suspend the operation at the hardware execution engine until the event is set. The plurality of synchronization primitive defines different conditions to be satisfied in order to perform the set operation.

    Dynamically configurable pipeline
    58.
    发明授权

    公开(公告)号:US11294841B1

    公开(公告)日:2022-04-05

    申请号:US16985056

    申请日:2020-08-04

    Abstract: Techniques disclosed herein relate to dynamically configurable multi-stage pipeline processing units. In one embodiment, a circuit includes a plurality of processing engines and a plurality of switches. Each of the plurality of processing engines includes an input port and an output port. Each of the plurality of switches comprises two input ports and two output ports. For each processing engine, the input port of the processing engine is electrically coupled to one of the switches, the output port of the processing engine is electrically coupled to another one of the switches, and the input port of the processing engine is electrically coupled to the output port of each of the processing engines by the switches.

    Registers for restricted memory
    59.
    发明授权

    公开(公告)号:US11294599B1

    公开(公告)日:2022-04-05

    申请号:US16891438

    申请日:2020-06-03

    Abstract: Provided are integrated circuits and methods for operating integrated circuits. An integrated circuit can include a plurality of memory banks and an execution engine including a set of execution components. Each execution component can be associated with a respective memory bank and can read from and write to the respective memory bank. The integrated circuit can further include a set of registers each associated with a respective memory bank from the plurality of memory banks. The integrated circuit can further be operable to load to or store from the set of registers in parallel, and load to or store from the set of registers serially. A parallel operation followed by a serial operation enables data to be moved from many memory banks into one memory bank. A serial operation followed by a parallel operation enables data to be moved from one memory bank into many memory banks.

    Flexible weight expansion
    60.
    发明授权

    公开(公告)号:US11263517B1

    公开(公告)日:2022-03-01

    申请号:US15908080

    申请日:2018-02-28

    Abstract: Disclosed herein are techniques for obtain weights for neural network computations. In one embodiment, an integrated circuit may include an arithmetic circuit configured to perform arithmetic operations for a neural network. The integrated circuit may also include a weight processing circuit configured to: acquire data from a memory device; receive configuration information indicating a size of each quantized weight of a set of quantized weights; extract the set of quantized weights from the data based on the size of the each weight indicated by the configuration information; perform de-quantization processing on the set of quantized weights to generate a set of de-quantized weights; and provide the set of de-quantized weights to the arithmetic circuit to enable the arithmetic circuit to perform the arithmetic operations. The memory device may be part of or external to the integrated circuit.

Patent Agency Ranking