Neural network operation reordering for parallel execution

    公开(公告)号:US11016775B2

    公开(公告)日:2021-05-25

    申请号:US16453478

    申请日:2019-06-26

    Abstract: Techniques are disclosed for reordering operations of a neural network to improve runtime efficiency. In some examples, a compiler receives a description of the neural network comprising a plurality of operations. The compiler may determine which execution engine of a plurality of execution engines is to perform each of the plurality of operations. The compiler may determine an order of performance associated with the plurality of operations. The compiler may identify a runtime inefficiency based on the order of performance and a hardware usage for each of the plurality of operations. An operation may be reordered to reduce the runtime inefficiency. Instructions may be compiled based on the plurality of operations, which include the reordered operation.

    Power virus generation
    22.
    发明授权

    公开(公告)号:US10963029B1

    公开(公告)日:2021-03-30

    申请号:US16453824

    申请日:2019-06-26

    Abstract: Systems and methods for power analysis of a hardware device design. In various examples, a target circuit can be defined within the hardware device design. The target circuit can include a plurality of digital circuit elements linking a plurality of input nodes with a plurality of output nodes. A solver can be used to search for a transition pattern that, when applied to the input nodes, causes a number of output nodes equal to a counter to transition from a first binary value to a second binary value. If a transition pattern cannot be found, the counter is decremented and a new transition pattern is searched for. Once a transition pattern is found, it is determined whether the transition pattern satisfies a constraint.

    Hardware engine with configurable instructions

    公开(公告)号:US10942742B1

    公开(公告)日:2021-03-09

    申请号:US16216212

    申请日:2018-12-11

    Abstract: A reconfigurable processing circuit and system are provided. The system allows a user to program machine-level instructions in order to reconfigure the way the circuit behaves, including by adding new operations. The system can include a profile access content-addressable memory (CAM) configured to receive an execution step value from a step counter. The execution step value can be incremented and/or reset by a step management logic. The profile access CAM can select an entry of a profile table based on an opcode and the execution step value, and the processing engine can execute microcode based on the selected entry of the profile table. The profile access CAM can translate the opcode to an internal short instruction identifier in order to select the entry of the profile table. The system can further include an instruction decoding module configured to merge multiple instruction fields into a single effective instruction field.

    Handling memory errors in computing systems

    公开(公告)号:US10908987B1

    公开(公告)日:2021-02-02

    申请号:US16367645

    申请日:2019-03-28

    Abstract: An error handling technique for a computing device includes detecting a memory error during execution of the program instructions to generate a computational result, and generating an error message containing information about the memory error. The error message can be stored in a notification memory space, and be made available for access, for example, by a host system. The execution of the program instructions is allowed to continue to generate the computational result despite detecting the memory error. When the computation result becomes available, a confidence level of the computational result can be determined based on which program instruction or which computational stage resulted in the memory error. The confidence level can be used to assess whether the computational result is acceptable.

    Multi-memory on-chip computational network

    公开(公告)号:US10803379B2

    公开(公告)日:2020-10-13

    申请号:US15839301

    申请日:2017-12-12

    Abstract: Provided are systems, methods, and integrated circuits for a neural network processing system. In various implementations, the system can include a first array of processing engines coupled to a first set of memory banks and a second array of processing engines coupled to a second set of memory banks. The first and second set of memory banks be storing all the weight values for a neural network, where the weight values are stored before any input data is received. Upon receiving input data, the system performs a task defined for the neural network. Performing the task can include computing an intermediate result using the first array of processing engines, copying the intermediate result to the second set of memory banks, and computing a final result using the second array of processing engines, where the final result corresponds to an outcome of performing the task.

    Memory access for multiple circuit components

    公开(公告)号:US10768856B1

    公开(公告)日:2020-09-08

    申请号:US15919167

    申请日:2018-03-12

    Abstract: Disclosed herein are techniques for performing memory access. In one embodiment, an integrated circuit may include a memory device, a first port to receive first data elements from a memory access circuit within a first time period, and a second port to transmit second data elements to the memory access circuit within a second time period. The memory access circuit may receive the first data elements from the memory device within a third time period shorter than the first time period and transmit, via the first port, the received first data elements to a first processing circuit sequentially within the first time period. The memory access circuit may receive, via the second port, the second data elements from a second processing circuit sequentially within the second time period, and store the received second data elements in the memory device within a fourth time period shorter than the second time period.

    Dynamically configurable pipeline
    28.
    发明授权

    公开(公告)号:US10747700B1

    公开(公告)日:2020-08-18

    申请号:US15832546

    申请日:2017-12-05

    Abstract: Techniques disclosed herein relate to dynamically configurable multi-stage pipeline processing units. In one embodiment, a circuit includes a plurality of processing engines and a plurality of switches. Each of the plurality of processing engines includes an input port and an output port. Each of the plurality of switches comprises two input ports and two output ports. For each processing engine, the input port of the processing engine is electrically coupled to one of the switches, the output port of the processing engine is electrically coupled to another one of the switches, and the input port of the processing engine is electrically coupled to the output port of each of the processing engines by the switches.

    Debug mechanisms for a processor circuit

    公开(公告)号:US10746792B1

    公开(公告)日:2020-08-18

    申请号:US16206761

    申请日:2018-11-30

    Abstract: An error-handling processing circuit and system are provided. The system can receive an error signal, such as an interrupt, and decouple (e.g., by a gate signal) a functional clock from a processing block, in some instances effectively halting the processing block's operation. This can prevent a cascade of interdependent errors, thereby avoiding producing redundant or confusing error information. The system can include the processing block, a debug clock not coupled to the processing block, and a data block (e.g., a register file) coupled to the debug clock and to an external input/output interface. The data block can be configured to continue receiving a clock signal via a multiplexer from the debug clock without disruption after the functional clock is decoupled, enabling the data block to remain operational for debugging.

Patent Agency Ranking