-
公开(公告)号:US20240061780A1
公开(公告)日:2024-02-22
申请号:US18450663
申请日:2023-08-16
发明人: Lide DUAN , Bowen HUANG , Qichen ZHANG , Shengcheng WANG , Yen-Kuang CHEN , Hongzhong ZHENG
IPC分类号: G06F12/0811 , G06F12/0846
CPC分类号: G06F12/0811 , G06F12/0848 , G06F2212/1021
摘要: A computer-implemented method for allocating memory bandwidth of multiple CPU cores in a server includes: receiving an access request to a last level cache (LLC) shared by the multiple CPU cores in the server, the access request being sent from a core with a private cache holding copies of frequently accessed data from a memory; determining whether the access request is an LLC hit or an LLC miss; and controlling a memory bandwidth controller based on the determination. The memory bandwidth controller performs a memory bandwidth throttling to control a request rate between the private cache and the last level cache. The LLC hit of the access request causes the memory bandwidth throttling initiated by the memory bandwidth controller to be disabled and the LLC miss of the access request causes the memory bandwidth throttling initiated by the memory bandwidth controller to be enabled.
-
公开(公告)号:US20240054021A1
公开(公告)日:2024-02-15
申请号:US18171250
申请日:2023-02-17
发明人: Meng Wang , Pengyu Zhang , Yunshan Jia , Biyi Li
IPC分类号: G06F9/50
CPC分类号: G06F9/505 , G06F9/5038 , G06F2209/5018 , G06F2209/5017
摘要: The present application provides a resource scheduling method and a server, wherein the method is applied to a scheduling component in user mode and includes: acquiring, at target scheduling time, an idle time point corresponding to respective virtual network element of the multiple virtual network elements, wherein the idle time point corresponds to a time point at which no load task is polled by a worker thread in the virtual network element; determining load status of the multiple virtual network elements based on time differences between the target scheduling time and the idle time point corresponding to respective virtual network element of the multiple virtual network elements; determining computational resource scheduling information of the multiple virtual network elements based on the load status of the multiple virtual network elements; sending the computational resource scheduling information to a kernel of the server to perform, through a schedule-class function in kernel mode, resource scheduling processing corresponding to the computational resource scheduling information. The resource scheduling method provided by the present solution is compatible with multiple virtual network elements with different implementations, and schedules, in real time, computational resources corresponding to the multiple virtual network elements, to improve the utilization of the computational resources.
-
公开(公告)号:US20240022757A1
公开(公告)日:2024-01-18
申请号:US18346766
申请日:2023-07-03
发明人: Jie Chen , Ru-Ling Liao , Xinwei Li , Yan Ye
IPC分类号: H04N19/513 , H04N19/132 , H04N19/159 , H04N19/176
CPC分类号: H04N19/513 , H04N19/132 , H04N19/159 , H04N19/176
摘要: A VVC-standard encoder and a VVC-standard decoder are provided, implementing application of DMVR on affine merge mode-coded blocks to refine the motion vector accuracy and thereby improve coding efficiency. A refined motion vector (MV) search is performed for a control point motion vector (CPMV) of an inter-coded coding block (CB), outputting a refined MV of the CB. A refined MV search includes deriving a MV of a subblock of the CB based on a CPMV of the CB, performing subblock MV refinement for the MV of the subblock, and outputting the refined MV of the CB based on a refined MV of the subblock. A refined MV search further includes deriving an affine model parameter based on a plurality of CPMVs of the CB, performing an affine parameter offset search for the affine model parameter, and outputting the refined MV of the CB based on an optimal parameter offset.
-
公开(公告)号:US11876973B2
公开(公告)日:2024-01-16
申请号:US17658378
申请日:2022-04-07
发明人: Ru-Ling Liao , Yan Ye , Xinwei Li , Jie Chen
IPC分类号: H04N19/119 , H04N19/132 , H04N19/139 , H04N19/157
CPC分类号: H04N19/139 , H04N19/119 , H04N19/132 , H04N19/157
摘要: A method, an apparatus and non-transitory computer-readable storage medium for video data process are provided. The method includes receiving a bitstream comprising a coding unit coded in a geometric partition mode (GPM); decoding a first parameter associated with the coding unit, the first parameter indicating whether template matching being applied to the coding unit; and determining, according to the first parameter, motion information for the coding unit, wherein when the first parameter indicates the template matching is applied to the coding unit, the motion information is refined using the template matching.
-
公开(公告)号:US20240007615A1
公开(公告)日:2024-01-04
申请号:US18215753
申请日:2023-06-28
发明人: Ru-Ling Liao , Jie Chen , Yan Ye , Xinwei Li
IPC分类号: H04N19/105 , H04N19/159 , H04N19/172
CPC分类号: H04N19/105 , H04N19/159 , H04N19/172
摘要: A VVC-standard encoder and a VVC-standard decoder are provided, implementing derivation of a BCW index according to cost values based on template matching. A template matching cost can be calculated for each among a set of possible BCW weight values, and a BCW weight value yielding a lowest template matching cost among each template matching cost calculated can be selected as a BCW index for a bi-predicted merge candidate. Alternatively, a template matching cost can be calculated for each among a subset of possible BCW weight values based on an inherited BCW weight. Additionally, a merge candidate BCW index can be derived while adjusting template matching cost of an inherited BCW weight from a value calculated according to the VVC standard and ECM specifications, or while adjusting template matching cost of a BCW weight having equal weight from a value calculated according to the VVC standard and ECM specifications.
-
公开(公告)号:US20240005509A1
公开(公告)日:2024-01-04
申请号:US18046397
申请日:2022-10-13
发明人: Yingda XIA , Jiawen YAO , Dakai JIN , Xiansheng HUA , Le LU , Ling ZHANG
IPC分类号: G06T7/11
CPC分类号: G06T7/11 , G06T2207/10081 , G06T2207/30172 , G06T2207/20081 , G06T2207/30028
摘要: A method, an apparatus, and a non-transitory computer readable medium for training an image processing model are provided. The method includes: acquiring a sample image comprising a target object to determine an object segmentation image of the target object in the sample image; constructing an object coordinate map corresponding to the object segmentation image according to the object segmentation image; and training an image processing model comprising a self-attention mechanism layer according to the sample image, the object segmentation image, and the object coordinate map.
-
87.
公开(公告)号:US20240004955A1
公开(公告)日:2024-01-04
申请号:US17984230
申请日:2022-11-09
发明人: Zhaoyang DU , Yijin GUAN , Dimin NIU , Hongzhong ZHENG
CPC分类号: G06F17/16 , G06F7/4876 , G06F9/5016
摘要: This application describes an accelerator, a computer system, and a method for memory optimization in sparse matrix-matrix multiplications (spGEMM). The memory optimization includes accurate memory pre-allocation for a to-be-generated output matrix of spGEMM between two sparse matrices. An exemplary method may include: sampling a plurality of first rows in the first sparse matrix; identifying, based on indices of non-zero data in the plurality of first rows, a plurality of second rows in a second sparse matrix; performing symbolic multiplication operations between the non-zero data in the plurality of first and second rows; determining an estimated compression ratio of the output matrix; determining an estimated mean row size for each row in the output matrix based on the estimated compression ratio; and allocating, according to the estimated mean row size and a total number of rows of the output matrix, a memory space in a hardware memory.
-
公开(公告)号:US20240004830A1
公开(公告)日:2024-01-04
申请号:US17982450
申请日:2022-11-07
发明人: Qichen ZHANG , Lide DUAN , Shengcheng WANG
CPC分类号: G06F15/8046 , G06F9/544 , G06F7/50 , G06F7/523
摘要: Embodiments of the present disclosure includes a processor. The processor may include a systolic array of processing elements; a first group of buffers coupled to the systolic array, wherein the first group comprises one or more first buffers; a second group of buffers coupled to the systolic array, wherein the second group comprises one or more second buffers; an accumulator coupled to the systolic array; and a third group of buffers coupled to the accumulator, wherein the third group comprises one or more third buffers.
-
公开(公告)号:US20230394617A1
公开(公告)日:2023-12-07
申请号:US18046097
申请日:2022-10-12
发明人: YUAN GAO , FEI SUN , HAORAN LI , GUYUE HUANG , CHEN ZHANG , RUIGUANG ZHONG
摘要: The present application discloses a warp execution method used for SPs of an SM of a GPU and an associated GPU. The SPs share a scratchpad memory, and the warp execution method includes: when the predetermined time point for warp-loading is reached, checking a first indicator to obtain a size of a space with the status of blank in the scratchpad memory, to determining whether to load the warp, wherein the first indicator is used to indicate a starting position of a space with the status of data-in-use and an ending position of the space with the status of blank; and when the predetermined time point for computing is reached, checking a second indicator and a third indicator to obtain a size of a space with the status of data-not-in-use in the scratchpad memory, to determining whether to compute the warp
-
公开(公告)号:US20230385373A1
公开(公告)日:2023-11-30
申请号:US18053524
申请日:2022-11-08
发明人: ZHAOHUI CHEN , XUANLE REN , YANHENG LU , JIANSONG ZHANG
CPC分类号: G06F17/156 , G06F7/727 , G06F7/722 , G06F7/76
摘要: The present application discloses a calculator and a method thereof. The calculator is configured to accelerate the number-theoretic transformation of a 2N-dimensional polynomial. The calculator includes a first coefficient memory, a second coefficient memory, a twiddle factor memory, a plurality of processing units and a data flow controller. In the odd-number rounds of coefficient computation operations, the processing units perform first calculation procedures to read coefficients from the first coefficient memory for modulo calculation, and perform first writing procedures to write output coefficients to the second coefficient memory. In even-number rounds of coefficient computation operations, the processing units performs second calculation procedures to read coefficients from the second coefficient memory for modulo calculations, and perform second writing procedures to write output coefficients to the first coefficient memory.
-
-
-
-
-
-
-
-
-