摘要:
A method and system for realizing a FPGA server, wherein centralized monitoring and managing all SoC FPGA compute nodes within the server by a motherboard, the motherboard comprising: a plurality of self-defined management interfaces for connecting the SoC FPGA compute nodes to supply power and data switch to the SoC FPGA compute nodes; a management network switch module for interconnecting the SoC FPGA compute nodes and supplying management; and a core control unit for managing the SoC FPGA compute nodes through the self-defined management interfaces and a self-defined management interface protocol, and acquiring operating parameters of the SoC FPGA compute nodes to manage and monitor the SoC FPGA compute nodes based on the management interface protocol.
摘要:
A method and system for realizing a FPGA server, wherein centralized monitoring and managing all SoC FPGA compute nodes within the server by a motherboard, the motherboard comprising: a plurality of self-defined management interfaces for connecting the SoC FPGA compute nodes to supply power and data switch to the SoC FPGA compute nodes; a management network switch module for interconnecting the SoC FPGA compute nodes and supplying management; and a core control unit for managing the SoC FPGA compute nodes through the self-defined management interfaces and a self-defined management interface protocol, and acquiring operating parameters of the SoC FPGA compute nodes to manage and monitor the SoC FPGA compute nodes based on the management interface protocol.
摘要:
The present invention provides a fractal tree structure-based data transmit device and method, a control device, and an intelligent chip. The device comprises: a central node that is as a communication data center of a network-on-chip and used for broadcasting or multicasting communication data to a plurality of leaf nodes; the plurality of leaf nodes that are as communication data nodes of the network-on-chip and for transmitting the communication data to a central leaf node; and forwarder modules for connecting the central node with the plurality of leaf nodes and forwarding the communication data; the central node, the forwarder modules and the plurality of leaf nodes are connected in the fractal tree network structure, and the central node is directly connected to M the forwarder modules and/or leaf nodes, any the forwarder module is directly connected to M the next level forwarder modules and/or leaf nodes.
摘要:
The disclosure provides a data packet classification method and system based on a convolutional neural network including merging each rule set in a training rule set to form a plurality of merging schemes, and determining an optimal merging scheme for each rule set in the training rule set on the basis of performance evaluation; converting a prefix combination distribution of each rule set in the training rule set and a target rule set into an image, and training a convolutional neural network model by taking the image and the corresponding optimal merging scheme as features; and classifying the target rule set on the basis of image similarity, and constructing a corresponding hash table for data packet classification.
摘要:
The present disclosure provides an operation apparatus and method for an acceleration chip for accelerating a deep neural network algorithm. The apparatus comprises: a vector addition processor module and a vector function value arithmetic unit and a vector multiplier-adder module wherein the three modules execute a programmable instruction, and interact with each other to calculate values of neurons and a network output result of a neural network, and a variation amount of a synaptic weight representing the interaction strength of the neurons on an input layer to the neurons on an output layer; and the three modules are all provided with an intermediate value storage region and perform read and write operations on a primary memory.
摘要:
Disclosed embodiments relate to a convolutional neural network computing method and system based on weight kneading, comprising: arranging original weights in a computation sequence and aligning by bit to obtain a weight matrix, removing slack bits in the weight matrix, allowing essential bits in each column of the weight matrix to fill the vacancies according to the computation sequence to obtain an intermediate matrix, removing null rows in the intermediate matrix, obtain a kneading matrix, wherein each row of the kneading matrix serves as a kneading weight; obtaining positional information of the activation corresponding to each bit of the kneading weight; divides the kneading weight by bit into multiple weight segments, processing summation of the weight segments and the corresponding activations according to the positional information, and sending a processing result to an adder tree to obtain an output feature map by means of executing shift-and-add on the processing result.
摘要:
The present invention proposes a dynamic resources allocation method and system for guaranteeing tail latency SLO of latency-sensitive applications. A plurality of request queues is created in a storage server node of a distributed storage system with different types of requests located in different queues, and thread groups are allocated to the request queues according to logical thread resources of the service node and target tail latency requirements, and thread resources are dynamically allocated in real time, and the thread group of each request queue is bound to physical CPU resources of the storage server node. The client sends an application's requests to the storage server node; the storage server node stores the request in a request queue corresponding to its type, uses the thread group allocated for the current queue to process the application's requests, and sends responses to the client.
摘要:
Disclosed embodiments relate to a split accumulator for a convolutional neural network accelerator, comprising: arranging original weights in a computation sequence and aligning by bit to obtain a weight matrix, removing slack bits in the weight matrix, allowing essential bits in each column of the weight matrix to fill the vacancies according to the computation sequence to obtain an intermediate matrix, removing null rows in the intermediate matrix, obtain a kneading matrix, wherein each row of the kneading matrix serves as a kneading weight; obtaining positional information of the activation corresponding to each bit of the kneading weight; divides the kneading weight by bit into multiple weight segments, processing summation of the weight segments and the corresponding activations according to the positional information, and sending a processing result to an adder tree to obtain an output feature map by means of executing shift-and-add on the processing result.
摘要:
Disclosed embodiments relate to a convolutional neural network accelerator, comprising: arranging original weights in a computation sequence and aligning by bit to obtain a weight matrix, removing slack bits in the weight matrix, allowing essential bits in each column of the weight matrix to fill the vacancies according to the computation sequence to obtain an intermediate matrix, removing null rows in the intermediate matrix, obtain a kneading matrix, wherein each row of the kneading matrix serves as a kneading weight; obtaining positional information of the activation corresponding to each bit of the kneading weight; divides the kneading weight by bit into multiple weight segments, processing summation of the weight segments and the corresponding activations according to the positional information, and sending a processing result to an adder tree to obtain an output feature map by means of executing shift-and-add on the processing result.
摘要:
A “rounding to zero” method can maintain the exact motion vector and can also be achieved by the method without division so as to improve the precision of calculating the motion vector, embody the motion of the object in video more factually, and obtain the more accurate motion vector prediction. Combining with the forward prediction coding and the backward prediction coding, the present invention realizes a new prediction coding mode, which can guarantee the high efficiency of coding in direct mode as well as is convenient for hardware realization, and gains the same effect as the conventional B frame coding.