-
公开(公告)号:US11908061B2
公开(公告)日:2024-02-20
申请号:US17463835
申请日:2021-09-01
Applicant: HUAWEI TECHNOLOGIES CO., LTD.
Inventor: Zhou Hong , Yufei Zhang
CPC classification number: G06T15/005 , G06F9/30101 , G06T1/60
Abstract: Methodologies and architectures are provided for inter-thread sharing of data in a general purpose register (GPR) of a multiprocessor apparatus. The data sharing is performed by a graphics processing unit (GPU) having at least one processing cluster including a plurality of processing cores (PCs) configured for parallel operation. Each PC of a cluster is configured to utilize a dedicated portion of the GPR. The GPU further includes a shared memory for the cluster, and a memory read/write hub coupled to the GPR and shared memory, the hub including a crossbar switch. A PC executes a move data instruction, including operands referencing a destination portion of the GPR and a source portion assigned to the PC, to retrieve data from the source portion. The memory read/write hub writes the data, via the crossbar switch, to the destination portion of the GPR without first writing the data to the shared memory.
-
公开(公告)号:US20210272232A1
公开(公告)日:2021-09-02
申请号:US17326913
申请日:2021-05-21
Applicant: HUAWEI TECHNOLOGIES CO., LTD.
Inventor: Zhou Hong , Yufei Zhang
Abstract: The disclosed technology relates to graphics processing units (GPU). In one aspect, a GPU includes a general purpose register (GPR) including registers, an arithmetic logic unit (ALU) reading pixels of an image independently of a shared memory, and a level 1 (L1) cache storing pixels to implement a pixel mapping that maps the pixels read from the L1 cache into the registers of the GPR. The pixel mapping includes separating pixels of an image into three regions, with each region including a set of pixels. A first and second set of the pixels are loaded into registers corresponding to two of the three regions horizontally, and a third set of the pixels are loaded into registers corresponding to the third of the three regions vertically. Each of the registers in the first, second, and third registers are loaded as a contiguous ordered number of registers in the GPR.
-
公开(公告)号:US12026801B2
公开(公告)日:2024-07-02
申请号:US17326913
申请日:2021-05-21
Applicant: HUAWEI TECHNOLOGIES CO., LTD.
Inventor: Zhou Hong , Yufei Zhang
CPC classification number: G06T1/60 , G06F17/153 , G06T1/20 , G06T3/18 , G06T15/005
Abstract: The disclosed technology relates to graphics processing units (GPU), In one aspect, a GPU includes a general purpose register (GPR) including registers, an arithmetic logic unit (ALU) reading pixels of an image independently of a shared memory, and a level 1 (L1) cache storing pixels to implement a pixel mapping that maps the pixels read from the L1 cache into the registers of the GPR. The pixel mapping includes separating pixels of an image into three regions, with each region including a set of pixels. A first and second set of the pixels are loaded into registers corresponding to two of the three regions horizontally, and a third set of the pixels are loaded into registers corresponding to the third of the three regions vertically. Each of the registers in the first, second, and third registers are loaded as a contiguous ordered number of registers in the GPR.
-
公开(公告)号:US12190109B2
公开(公告)日:2025-01-07
申请号:US17486434
申请日:2021-09-27
Applicant: Huawei Technologies Co., Ltd.
Inventor: Lin Chen , Zhou Hong , Yufei Zhang
Abstract: A method of storing data in general purpose registers (GPRs) includes packing a tile of data items into GPRs, where the tile includes multiple channels. The tile of data items is read from memory. At least two channels of the data are stored in a first GPR, and at least two additional channels are stored in a second GPR. Auxiliary data is loaded into a third GPR. The auxiliary data and the tile data can be used together for performing convolution operations.
-
公开(公告)号:US11915338B2
公开(公告)日:2024-02-27
申请号:US17319301
申请日:2021-05-13
Applicant: HUAWEI TECHNOLOGIES CO., LTD.
Inventor: Zhou Hong , Yufei Zhang
Abstract: The disclosed technology generally relates to a graphics processing unit (GPU). In one aspect, a GPU includes a general purpose register (GPR) having registers, an arithmetic logic unit (ALU) configured to read pixels of an image independently of a shared memory, and a level 1 (L1) cache storing the pixels read by the ALU. The ALU can implement pixel mapping by fetching a quad of pixels, which includes pixels of first, second, third, and fourth pixel types, from the L1 cache, grouping the pixels of the different pixel types of the quad into four groups based on pixel type, and, for each group, separating the pixels included in the group into three regions that each have a set of pixels. The pixels for each group can then be loaded into the registers corresponding to the three regions.
-
公开(公告)号:US20220012053A1
公开(公告)日:2022-01-13
申请号:US17486434
申请日:2021-09-27
Applicant: Huawei Technologies Co., Ltd.
Inventor: Lin Chen , Zhou Hong , Yufei Zhang
Abstract: A method of storing data in general purpose registers (GPRs) includes packing a tile of data items into GPRs, where the tile includes multiple channels. The tile of data items is read from memory. At least two channels of the data are stored in a first GPR, and at least two additional channels are stored in a second GPR. Auxiliary data is loaded into a third GPR. The auxiliary data and the tile data can be used together for performing convolution operations.
-
公开(公告)号:US20210264560A1
公开(公告)日:2021-08-26
申请号:US17319301
申请日:2021-05-13
Applicant: HUAWEI TECHNOLOGIES CO., LTD.
Inventor: Zhou Hong , Yufei Zhang
Abstract: The disclosed technology generally relates to a graphics processing unit (GPU). In one aspect, a GPU includes a general purpose register (GPR) having registers, an arithmetic logic unit (ALU) configured to read pixels of an image independently of a shared memory, and a level 1 (L1) cache storing the pixels read by the ALU. The ALU can implement pixel mapping by fetching a quad of pixels, which includes pixels of first, second, third, and fourth pixel types, from the L1 cache, grouping the pixels of the different pixel types of the quad into four groups based on pixel type, and, for each group, separating the pixels included in the group into three regions that each have a set of pixels. The pixels for each group can then be loaded into the registers corresponding to the three regions.
-
-
-
-
-
-