-
公开(公告)号:US12027512B2
公开(公告)日:2024-07-02
申请号:US17468687
申请日:2021-09-08
Applicant: Shanghai Biren Technology Co., Ltd
Inventor: Shiqun Gu , Linglan Zhang
IPC: H01L25/18 , H01L23/00 , H01L23/48 , H01L23/498
CPC classification number: H01L25/18 , H01L23/481 , H01L23/49816 , H01L24/08 , H01L24/80 , H01L2224/08145 , H01L2224/80895 , H01L2224/80896
Abstract: The disclosure provides a chipset and a manufacturing method thereof. The chipset includes multiple logic cores and a memory chip. The logic cores respectively have a first device layer and a first substrate layer, and respectively include multiple first bonding elements and a first input/output circuit. The first bonding elements are provided in the first device layer. The first input/output circuit is provided in the first device layer. The memory chip has a second device layer and a second substrate layer, and includes second bonding elements and second input/output circuits. The second bonding elements are arranged in the second device layer. The second input/output circuits are arranged in the second device layer, and are respectively connected to the first input/output circuits of the logic cores.
-
2.
公开(公告)号:US11941396B2
公开(公告)日:2024-03-26
申请号:US17958445
申请日:2022-10-03
Applicant: Shanghai Biren Technology Co., Ltd
Inventor: Zhou Hong , Yunya Fei , Hao Shu , ChengKun Sun
CPC classification number: G06F9/3001 , G06F9/505
Abstract: The present disclosure provides a DIDT control method. The method includes, at each of a plurality of DIDT control modules: obtaining a local operation load of a local ALU in each clock cycle; obtaining a global operation load of a plurality of ALUs in each cycle period; determining an operation load index of the local ALU based on local historical load information and a local historical load weight set of the local ALU and global historical load information and a global historical load weight set of the multiple ALUs, the global historical load information includes a first number of the global operation loads, the local historical load information includes a second number of the local operation loads; and adjusting an operation load of the local ALU based on the operation load index of the local ALU and a predetermined load threshold to control a DIDT of the local ALU.
-
公开(公告)号:US11900175B2
公开(公告)日:2024-02-13
申请号:US17524700
申请日:2021-11-11
Applicant: Shanghai Biren Technology Co., Ltd
Inventor: Zhou Hong , YuFei Zhang , ChengKun Sun , Lin Chen
CPC classification number: G06F9/52 , G06F9/4881 , G06F9/54 , G06T1/20
Abstract: The embodiments of the disclosure relate to a computing device, a computing equipment, and a programmable scheduling method for data loading and execution, and relate to the field of computer. The computing device is coupled to a first computing core and a first memory. The computing device includes a scratchpad memory, a second computing core, a first hardware queue, a second hardware queue and a synchronization unit. The second computing core is configured for acceleration in a specific field. The first hardware queue receives a load request from the first computing core. The second hardware queue receives an execution request from the first computing core. The synchronization unit configured to make the triggering of the load request and the execution request to cooperate with each other. In this manner, flexibility, throughput, and overall performance can be enhanced.
-
4.
公开(公告)号:US20230117626A1
公开(公告)日:2023-04-20
申请号:US17958441
申请日:2022-10-03
Applicant: Shanghai Biren Technology Co.,Ltd
IPC: G06F17/15
Abstract: A convolution apparatus including a data memory, a matrix unknit-knit device, and a convolution operation device, a convolution method, a matrix unknit-knit device, and a matrix unknit-knit method are provided. The matrix unknit-knit device unknits a first matrix stored in the data memory into s*s second matrices (or knits the s*s second matrices into the first matrix), where s is greater than 1. Pixels in each of s*s subblocks in the first matrix serve one-to-one as pixels of the s*s second matrices. A convolution operation device unknits a convolution kernel of a convolution operation with a stride of s into s*s sub-kernels, uses any one of the sub-kernels to perform a convolution operation with a stride of 1 on one corresponding second matrix, and accumulates the operation results the second matrices as the operation result of performing the convolution operation with a stride of s on the first matrix.
-
公开(公告)号:US20220368619A1
公开(公告)日:2022-11-17
申请号:US17742393
申请日:2022-05-11
Applicant: Shanghai Biren Technology Co.,Ltd
Inventor: Zhou HONG , Qin ZHENG , ChengPing LUO
IPC: H04L45/02 , H04L45/122 , H04L45/24 , H04L45/302
Abstract: The present disclosure provides a computing system, a computing processor and a data processing method for the computing processor. The computing system includes: multiple computing clusters, each computing cluster includes multiple computing nodes, and each computing node includes multiple computing processors. At least some computing clusters among the computing clusters, at least some computing nodes in each computing cluster and at least some computing processors of each computing node are connected through direct links. Each computing processor of at least some computing processors of the computing node is configured with a local routing table, which is configured for the computing processor to determine, based on the local routing table, the next direct link through which a data packet performs routing from a data source to a data destination, and the computing processor forwards the data packet through the next direct link.
-
公开(公告)号:US20220295080A1
公开(公告)日:2022-09-15
申请号:US17691134
申请日:2022-03-10
Applicant: Shanghai Biren Technology Co.,Ltd
Inventor: YuFei ZHANG , Zhou HONG
IPC: H04N19/176 , H04N19/117 , H04N19/61 , G06N3/04
Abstract: The present disclosure relates to a method for computing, computing device and computer-readable storage medium. The method includes: determining a pixel block set in a cache, a first pixel block in the pixel block set comprising an m×n pixel matrix having a first padding setting related to the original pixel data, the m and n being positive integers; and storing the determined pixel block set in a buffer to enable a second pixel block to be read from the buffer based on the buffer initial address of the first pixel block and an address offset associated with the second pixel block, wherein the second pixel block has a second padding setting related to the original pixel data, and the first padding setting and the second padding setting have the same offset amount in a first direction relative to the original pixel data.
-
公开(公告)号:US20220292632A1
公开(公告)日:2022-09-15
申请号:US17692198
申请日:2022-03-11
Applicant: Shanghai Biren Technology Co.,Ltd
Inventor: YuFei ZHANG , Zhu LIANG , ChengKun SUN
Abstract: A method for computing, a computing device, and a computer-readable storage medium are provided. The method includes determining a first pixel block in a cache. The first pixel block is composed of a 2m row×2n column pixel matrix and includes original pixel data and pixel data related to the original pixel data. The first pixel block is read from the cache. At least part of the pixel data related to the original pixel data is used for padding related to the original pixel data. The original pixel data includes pixel data from the (n+1)th column to the 2nth column in the (m+1)th row to the 2mth row in the 2m row×2 n column pixel matrix. When reading data from the cache, pixel data that needs to be obtained after insert-zero and padding operations on the original pixel data in back propagation can be read at one time.
-
公开(公告)号:US20220283790A1
公开(公告)日:2022-09-08
申请号:US17686413
申请日:2022-03-04
Applicant: Shanghai Biren Technology Co., Ltd
Inventor: HaiChuan WANG , Huayuan TIAN , Long CHEN
Abstract: A method for executing computation, a computing device, a computing system, and a storage medium are provided. The method includes: confirming, via a compiler, whether there is a call instruction related to a thread block modification request in a kernel function to be compiled; in response to confirming that there is the call instruction related to the thread block modification request in the kernel function to be compiled, determining a corresponding program segment associated with the call instruction; configuring a required thread block and thread local register for the corresponding program segment; and inserting a control instruction into the corresponding program segment to enable the thread block configured for the corresponding program segment to execute relevant computation of the corresponding program segment, and an unconfigured thread block not to execute the relevant computation. The disclosure can improve overall performance, make coding and maintenance easy and reduce error rate of code.
-
公开(公告)号:US20220121444A1
公开(公告)日:2022-04-21
申请号:US17366588
申请日:2021-07-02
Applicant: Shanghai Biren Technology Co., Ltd
Inventor: Zhou HONG , YuFei ZHANG , ChengKun SUN , Lin CHEN , Hao SHU
Abstract: The invention relates to an apparatus for configuring cooperative warps in a vector computing system. The apparatus includes general-purpose registers (GPRs); an arithmetic logical unit (ALU); and a warp instruction scheduler. The warp instruction scheduler is arranged operably to: allow each of a plurality of warps to access to data of a whole or a designated portion of the GPRs through the ALU in accordance with a configuration by a software when being executed; and complete calculations of each warp through the ALU.
-
公开(公告)号:US12095654B2
公开(公告)日:2024-09-17
申请号:US18487118
申请日:2023-10-15
Applicant: Shanghai Biren Technology Co.,Ltd
Inventor: Qin Zheng , Zhou Hong , YuFei Zhang , Lin Chen , ChengKun Sun , Tong Sun , ChengPing Luo , HaiChuan Wang
CPC classification number: H04L45/20 , H04L12/1836 , H04L12/185 , H04L45/42
Abstract: An information processing method, an interconnection device, and a computer-readable storage medium are provided. The interconnection device includes a request processing module configured for: receiving a data access request from at least one processor, wherein the data access request comprises a merge bit, a multicast group identifier (MGID), and a multicast transaction identifier (MTID); determining whether the data access request is a multicast request; determining whether the interconnection device receives other multicast requests if it is determined that the data access request is a multicast request based on the MGID, the MTID, and a static routing policy of a multicast group; and obtaining the other multicast requests if it is determined that the interconnection device receives the other multicast requests, merging the multicast request with the other multicast requests into a merged request, and forwarding the merged request to a next-hop device of the interconnection device.
-
-
-
-
-
-
-
-
-