SYSTEMS AND METHODS FOR MEMORY BANDWIDTH ALLOCATION

    公开(公告)号:US20240061780A1

    公开(公告)日:2024-02-22

    申请号:US18450663

    申请日:2023-08-16

    IPC分类号: G06F12/0811 G06F12/0846

    摘要: A computer-implemented method for allocating memory bandwidth of multiple CPU cores in a server includes: receiving an access request to a last level cache (LLC) shared by the multiple CPU cores in the server, the access request being sent from a core with a private cache holding copies of frequently accessed data from a memory; determining whether the access request is an LLC hit or an LLC miss; and controlling a memory bandwidth controller based on the determination. The memory bandwidth controller performs a memory bandwidth throttling to control a request rate between the private cache and the last level cache. The LLC hit of the access request causes the memory bandwidth throttling initiated by the memory bandwidth controller to be disabled and the LLC miss of the access request causes the memory bandwidth throttling initiated by the memory bandwidth controller to be enabled.

    RESOURCE SCHEDULING METHOD AND SERVER
    82.
    发明公开

    公开(公告)号:US20240054021A1

    公开(公告)日:2024-02-15

    申请号:US18171250

    申请日:2023-02-17

    IPC分类号: G06F9/50

    摘要: The present application provides a resource scheduling method and a server, wherein the method is applied to a scheduling component in user mode and includes: acquiring, at target scheduling time, an idle time point corresponding to respective virtual network element of the multiple virtual network elements, wherein the idle time point corresponds to a time point at which no load task is polled by a worker thread in the virtual network element; determining load status of the multiple virtual network elements based on time differences between the target scheduling time and the idle time point corresponding to respective virtual network element of the multiple virtual network elements; determining computational resource scheduling information of the multiple virtual network elements based on the load status of the multiple virtual network elements; sending the computational resource scheduling information to a kernel of the server to perform, through a schedule-class function in kernel mode, resource scheduling processing corresponding to the computational resource scheduling information. The resource scheduling method provided by the present solution is compatible with multiple virtual network elements with different implementations, and schedules, in real time, computational resources corresponding to the multiple virtual network elements, to improve the utilization of the computational resources.

    DECODER-SIDE MOTION VECTOR REFINEMENT FOR AFFINE MOTION COMPENSATION

    公开(公告)号:US20240022757A1

    公开(公告)日:2024-01-18

    申请号:US18346766

    申请日:2023-07-03

    摘要: A VVC-standard encoder and a VVC-standard decoder are provided, implementing application of DMVR on affine merge mode-coded blocks to refine the motion vector accuracy and thereby improve coding efficiency. A refined motion vector (MV) search is performed for a control point motion vector (CPMV) of an inter-coded coding block (CB), outputting a refined MV of the CB. A refined MV search includes deriving a MV of a subblock of the CB based on a CPMV of the CB, performing subblock MV refinement for the MV of the subblock, and outputting the refined MV of the CB based on a refined MV of the subblock. A refined MV search further includes deriving an affine model parameter based on a plurality of CPMVs of the CB, performing an affine parameter offset search for the affine model parameter, and outputting the refined MV of the CB based on an optimal parameter offset.

    DERIVING BI-PREDICTION WITH CODING UNIT-LEVEL WEIGHT INDICES FOR MERGE CANDIDATES

    公开(公告)号:US20240007615A1

    公开(公告)日:2024-01-04

    申请号:US18215753

    申请日:2023-06-28

    摘要: A VVC-standard encoder and a VVC-standard decoder are provided, implementing derivation of a BCW index according to cost values based on template matching. A template matching cost can be calculated for each among a set of possible BCW weight values, and a BCW weight value yielding a lowest template matching cost among each template matching cost calculated can be selected as a BCW index for a bi-predicted merge candidate. Alternatively, a template matching cost can be calculated for each among a subset of possible BCW weight values based on an inherited BCW weight. Additionally, a merge candidate BCW index can be derived while adjusting template matching cost of an inherited BCW weight from a value calculated according to the VVC standard and ECM specifications, or while adjusting template matching cost of a BCW weight having equal weight from a value calculated according to the VVC standard and ECM specifications.

    COMPUTER-IMPLEMENTED MEMORY ALLOCATION METHOD FOR SPARSE MATRIX MULTIPLICATION APPLICATIONS

    公开(公告)号:US20240004955A1

    公开(公告)日:2024-01-04

    申请号:US17984230

    申请日:2022-11-09

    IPC分类号: G06F17/16 G06F7/487 G06F9/50

    摘要: This application describes an accelerator, a computer system, and a method for memory optimization in sparse matrix-matrix multiplications (spGEMM). The memory optimization includes accurate memory pre-allocation for a to-be-generated output matrix of spGEMM between two sparse matrices. An exemplary method may include: sampling a plurality of first rows in the first sparse matrix; identifying, based on indices of non-zero data in the plurality of first rows, a plurality of second rows in a second sparse matrix; performing symbolic multiplication operations between the non-zero data in the plurality of first and second rows; determining an estimated compression ratio of the output matrix; determining an estimated mean row size for each row in the output matrix based on the estimated compression ratio; and allocating, according to the estimated mean row size and a total number of rows of the output matrix, a memory space in a hardware memory.

    WARP EXECUTION METHOD AND ASSOCIATED GPU
    89.
    发明公开

    公开(公告)号:US20230394617A1

    公开(公告)日:2023-12-07

    申请号:US18046097

    申请日:2022-10-12

    IPC分类号: G06T1/60 G06T1/20

    CPC分类号: G06T1/60 G06T1/20

    摘要: The present application discloses a warp execution method used for SPs of an SM of a GPU and an associated GPU. The SPs share a scratchpad memory, and the warp execution method includes: when the predetermined time point for warp-loading is reached, checking a first indicator to obtain a size of a space with the status of blank in the scratchpad memory, to determining whether to load the warp, wherein the first indicator is used to indicate a starting position of a space with the status of data-in-use and an ending position of the space with the status of blank; and when the predetermined time point for computing is reached, checking a second indicator and a third indicator to obtain a size of a space with the status of data-not-in-use in the scratchpad memory, to determining whether to compute the warp

    CALCULATOR AND ASSOCIATED METHOD
    90.
    发明公开

    公开(公告)号:US20230385373A1

    公开(公告)日:2023-11-30

    申请号:US18053524

    申请日:2022-11-08

    IPC分类号: G06F17/15 G06F7/72 G06F7/76

    摘要: The present application discloses a calculator and a method thereof. The calculator is configured to accelerate the number-theoretic transformation of a 2N-dimensional polynomial. The calculator includes a first coefficient memory, a second coefficient memory, a twiddle factor memory, a plurality of processing units and a data flow controller. In the odd-number rounds of coefficient computation operations, the processing units perform first calculation procedures to read coefficients from the first coefficient memory for modulo calculation, and perform first writing procedures to write output coefficients to the second coefficient memory. In even-number rounds of coefficient computation operations, the processing units performs second calculation procedures to read coefficients from the second coefficient memory for modulo calculations, and perform second writing procedures to write output coefficients to the first coefficient memory.