-
公开(公告)号:US11531889B2
公开(公告)日:2022-12-20
申请号:US16762810
申请日:2018-02-28
Abstract: Disclosed are a weight data storage method and a convolution computation method that may be implemented in a neural network. The weight data storage method comprises searching for effective weights in a weight convolution kernel matrix and acquiring an index of effective weights. The effective weights are non-zero weights, and the index of effective weights is used to mark the position of the effective weights in the weight convolution kernel matrix. The weight data storage method further comprises storing the effective weights and the index of effective weights. According to the weight data storage method and the convolution computation method of the present disclosure, storage space can be saved, and computation efficiency can be improved.
-
12.
公开(公告)号:US20210075639A1
公开(公告)日:2021-03-11
申请号:US17100570
申请日:2020-11-20
Inventor: Jinhua Tao , Tao Luo , Shaoli Liu , Shijin Zhang , Yunji Chen
IPC: H04L12/44 , H04L12/761 , H04L12/933
Abstract: The present invention provides a fractal tree structure-based data transmit device and method, a control device, and an intelligent chip. The device comprises: a central node that is as a communication data center of a network-on-chip and used for broadcasting or multicasting communication data to a plurality of leaf nodes; the plurality of leaf nodes that are as communication data nodes of the network-on-chip and for transmitting the communication data to a central leaf node; and forwarder modules for connecting the central node with the plurality of leaf nodes and forwarding the communication data; the central node, the forwarder modules and the plurality of leaf nodes are connected in the fractal tree network structure, and the central node is directly connected to M the forwarder modules and/or leaf nodes, any the forwarder module is directly connected to M the next level forwarder modules and/or leaf nodes.
-
13.
公开(公告)号:US10904034B2
公开(公告)日:2021-01-26
申请号:US15781608
申请日:2016-06-17
Inventor: Jinhua Tao , Tao Luo , Shaoli Liu , Shijin Zhang , Yunji Chen
IPC: H04L12/44 , H04L12/761 , H04L12/933
Abstract: One example of a device comprises: a central node that is as a communication data center of a network-on-chip; a plurality of leaf nodes that are as communication data nodes of the network-on-chip and for transmitting the communication data to a central leaf node; forwarder modules for connecting the central node with the plurality of leaf nodes and forwarding the communication data, wherein the plurality of leaf nodes are divided into N groups, each group having the same number of leaf nodes, the central node is individually in communication connection with each group of leaf nodes by means of the forwarder module, a communication structure constituted by each group of leaf nodes has self-similarity, and the plurality of leaf nodes are in communication connection with the central node in a complete multi-way tree approach by means of the forwarder modules of multiple levels.
-
14.
公开(公告)号:US20200272595A1
公开(公告)日:2020-08-27
申请号:US15781039
申请日:2016-06-17
Inventor: Dong HAN , Tao LUO , Shaoli LIU , Shijin ZHANG , Yunji CHEN
IPC: G06F15/78 , G06F17/16 , G06F16/901
Abstract: An example device comprises a central node for receiving vector data returned by leaf nodes, a plurality of leaf nodes for calculating and shifting the vector data, and forwarder modules comprising a local cache structure and a data processing component, wherein the plurality of leaf nodes are divided into N groups, each group having the same number of leaf nodes; the central node is individually in communication connection with each group of leaf nodes by means of the forwarder modules; a communication structure constituted by each group of leaf nodes has self-similarity; the plurality of leaf nodes are in communication connection with the central node in a complete M-way tree approach by means of the forwarder modules of multiple levels; each of the leaf nodes comprises a setting bit.
-
15.
公开(公告)号:US20180375789A1
公开(公告)日:2018-12-27
申请号:US15781686
申请日:2016-06-17
Inventor: Huiying LAN , Tao LUO , Shaoli LIU , Shijin ZHANG , Yunji CHEN
IPC: H04L12/911 , G06F15/173 , H04L12/44 , H04L12/755
Abstract: A communication structure comprises: a central node that is a communication data center of a network-on-chip and used for broadcasting or multicasting communication data to a plurality of leaf nodes; a plurality of leaf nodes that are communication data nodes of the network-on-chip and used for transmitting the communication data to the central node; and forwarder modules for connecting the central node with the plurality of leaf nodes and forwarding the communication data, wherein the plurality of leaf nodes are divided into N groups, each group having the same number of leaf nodes, the central node is individually in communication connection with each group of leaf nodes by means of the forwarder modules, the communication structure is a fractal-tree structure, the communication structure constituted by each group of leaf nodes has self-similarity, and the forwarder modules comprises a central forwarder module, leaf forwarder modules, and intermediate forwarder modules.
-
公开(公告)号:US11977784B2
公开(公告)日:2024-05-07
申请号:US17908495
申请日:2020-07-06
Inventor: Liuying Ma , Zhenqing Liu , Jin Xiong , Dejun Jiang
IPC: G06F3/06
CPC classification number: G06F3/0659 , G06F3/0611 , G06F3/0622 , G06F3/067
Abstract: The present invention proposes a dynamic resources allocation method and system for guaranteeing tail latency SLO of latency-sensitive applications. A plurality of request queues is created in a storage server node of a distributed storage system with different types of requests located in different queues, and thread groups are allocated to the request queues according to logical thread resources of the service node and target tail latency requirements, and thread resources are dynamically allocated in real time, and the thread group of each request queue is bound to physical CPU resources of the storage server node. The client sends an application's requests to the storage server node; the storage server node stores the request in a request queue corresponding to its type, uses the thread group allocated for the current queue to process the application's requests, and sends responses to the client.
-
公开(公告)号:US20210357735A1
公开(公告)日:2021-11-18
申请号:US17250890
申请日:2019-05-21
Inventor: Xiaowei LI , Xin WEI , Hang LU
Abstract: Disclosed embodiments relate to a split accumulator for a convolutional neural network accelerator, comprising: arranging original weights in a computation sequence and aligning by bit to obtain a weight matrix, removing slack bits in the weight matrix, allowing essential bits in each column of the weight matrix to fill the vacancies according to the computation sequence to obtain an intermediate matrix, removing null rows in the intermediate matrix, obtain a kneading matrix, wherein each row of the kneading matrix serves as a kneading weight; obtaining positional information of the activation corresponding to each bit of the kneading weight; divides the kneading weight by bit into multiple weight segments, processing summation of the weight segments and the corresponding activations according to the positional information, and sending a processing result to an adder tree to obtain an output feature map by means of executing shift-and-add on the processing result.
-
公开(公告)号:US20210350204A1
公开(公告)日:2021-11-11
申请号:US17250889
申请日:2019-05-21
Inventor: Xiaowei LI , Xin WEI , Hang LU
IPC: G06N3/04
Abstract: Disclosed embodiments relate to a convolutional neural network accelerator, comprising: arranging original weights in a computation sequence and aligning by bit to obtain a weight matrix, removing slack bits in the weight matrix, allowing essential bits in each column of the weight matrix to fill the vacancies according to the computation sequence to obtain an intermediate matrix, removing null rows in the intermediate matrix, obtain a kneading matrix, wherein each row of the kneading matrix serves as a kneading weight; obtaining positional information of the activation corresponding to each bit of the kneading weight; divides the kneading weight by bit into multiple weight segments, processing summation of the weight segments and the corresponding activations according to the positional information, and sending a processing result to an adder tree to obtain an output feature map by means of executing shift-and-add on the processing result.
-
公开(公告)号:US20190026626A1
公开(公告)日:2019-01-24
申请号:US16071801
申请日:2016-08-09
Inventor: Zidong Du , Qi Guo , Tianshi Chen , Yunji Chen
Abstract: A neural network accelerator and an operation method thereof applicable in the field of neural network algorithms are disclosed. The neural network accelerator comprises an on-chip storage medium for storing data externally transmitted or for storing data generated during computing; an on-chip address index module for mapping to a correct storage address on the basis of an input index when an operation is performed; a core computing module for performing a neural network operation; and a multi-ALU device for obtaining input data from the core computing module or the on-chip storage medium to perform a nonlinear operation which cannot be completed by the core computing module. By introducing a multi -ALU design into the neural network accelerator, an operation speed of the nonlinear operation is increased, such that the neural network accelerator is more efficient.
-
公开(公告)号:US20190018766A1
公开(公告)日:2019-01-17
申请号:US16070735
申请日:2016-08-09
Inventor: Qi GUO , Tianshi CHEN , Yunji CHEN
IPC: G06F12/02
Abstract: The present disclosure may include a method that comprises: partitioning data on an on-chip and/or an off-chip storage medium into different data blocks according to a pre-determined data partitioning principle, wherein data with a reuse distance less than a pre-determined distance threshold value is partitioned into the same data block; and a data indexing step for successively loading different data blocks to at least one on-chip processing unit according a pre-determined ordinal relation of a replacement policy, wherein the repeated data in a loaded data block being subjected to on-chip repetitive addressing. Data with a reuse distance less than a pre-determined distance threshold value is partitioned into the same data block, and the data partitioned into the same data block can be loaded on a chip once for storage, and is then used as many times as possible, so that the access is more efficient.
-
-
-
-
-
-
-
-
-