Inter-warp sharing of general purpose register data in GPU

    公开(公告)号:US11908061B2

    公开(公告)日:2024-02-20

    申请号:US17463835

    申请日:2021-09-01

    CPC classification number: G06T15/005 G06F9/30101 G06T1/60

    Abstract: Methodologies and architectures are provided for inter-thread sharing of data in a general purpose register (GPR) of a multiprocessor apparatus. The data sharing is performed by a graphics processing unit (GPU) having at least one processing cluster including a plurality of processing cores (PCs) configured for parallel operation. Each PC of a cluster is configured to utilize a dedicated portion of the GPR. The GPU further includes a shared memory for the cluster, and a memory read/write hub coupled to the GPR and shared memory, the hub including a crossbar switch. A PC executes a move data instruction, including operands referencing a destination portion of the GPR and a source portion assigned to the PC, to retrieve data from the source portion. The memory read/write hub writes the data, via the crossbar switch, to the destination portion of the GPR without first writing the data to the shared memory.

    Filter Independent L1 Mapping Of Convolution Data Into General Purpose Register

    公开(公告)号:US20210272232A1

    公开(公告)日:2021-09-02

    申请号:US17326913

    申请日:2021-05-21

    Abstract: The disclosed technology relates to graphics processing units (GPU). In one aspect, a GPU includes a general purpose register (GPR) including registers, an arithmetic logic unit (ALU) reading pixels of an image independently of a shared memory, and a level 1 (L1) cache storing pixels to implement a pixel mapping that maps the pixels read from the L1 cache into the registers of the GPR. The pixel mapping includes separating pixels of an image into three regions, with each region including a set of pixels. A first and second set of the pixels are loaded into registers corresponding to two of the three regions horizontally, and a third set of the pixels are loaded into registers corresponding to the third of the three regions vertically. Each of the registers in the first, second, and third registers are loaded as a contiguous ordered number of registers in the GPR.

    Filter independent L1 mapping of convolution data into general purpose register

    公开(公告)号:US12026801B2

    公开(公告)日:2024-07-02

    申请号:US17326913

    申请日:2021-05-21

    CPC classification number: G06T1/60 G06F17/153 G06T1/20 G06T3/18 G06T15/005

    Abstract: The disclosed technology relates to graphics processing units (GPU), In one aspect, a GPU includes a general purpose register (GPR) including registers, an arithmetic logic unit (ALU) reading pixels of an image independently of a shared memory, and a level 1 (L1) cache storing pixels to implement a pixel mapping that maps the pixels read from the L1 cache into the registers of the GPR. The pixel mapping includes separating pixels of an image into three regions, with each region including a set of pixels. A first and second set of the pixels are loaded into registers corresponding to two of the three regions horizontally, and a third set of the pixels are loaded into registers corresponding to the third of the three regions vertically. Each of the registers in the first, second, and third registers are loaded as a contiguous ordered number of registers in the GPR.

    Storing complex data in warp GPRS

    公开(公告)号:US12190109B2

    公开(公告)日:2025-01-07

    申请号:US17486434

    申请日:2021-09-27

    Abstract: A method of storing data in general purpose registers (GPRs) includes packing a tile of data items into GPRs, where the tile includes multiple channels. The tile of data items is read from memory. At least two channels of the data are stored in a first GPR, and at least two additional channels are stored in a second GPR. Auxiliary data is loaded into a third GPR. The auxiliary data and the tile data can be used together for performing convolution operations.

    Loading apparatus and method for convolution with stride or dilation of 2

    公开(公告)号:US11915338B2

    公开(公告)日:2024-02-27

    申请号:US17319301

    申请日:2021-05-13

    CPC classification number: G06T1/20 G06F9/462 G06T1/60

    Abstract: The disclosed technology generally relates to a graphics processing unit (GPU). In one aspect, a GPU includes a general purpose register (GPR) having registers, an arithmetic logic unit (ALU) configured to read pixels of an image independently of a shared memory, and a level 1 (L1) cache storing the pixels read by the ALU. The ALU can implement pixel mapping by fetching a quad of pixels, which includes pixels of first, second, third, and fourth pixel types, from the L1 cache, grouping the pixels of the different pixel types of the quad into four groups based on pixel type, and, for each group, separating the pixels included in the group into three regions that each have a set of pixels. The pixels for each group can then be loaded into the registers corresponding to the three regions.

    Storing Complex Data in Warp GPRS

    公开(公告)号:US20220012053A1

    公开(公告)日:2022-01-13

    申请号:US17486434

    申请日:2021-09-27

    Abstract: A method of storing data in general purpose registers (GPRs) includes packing a tile of data items into GPRs, where the tile includes multiple channels. The tile of data items is read from memory. At least two channels of the data are stored in a first GPR, and at least two additional channels are stored in a second GPR. Auxiliary data is loaded into a third GPR. The auxiliary data and the tile data can be used together for performing convolution operations.

    LOADING APPARATUS AND METHOD FOR CONVOLUTION WITH STRIDE OR DILATION OF 2

    公开(公告)号:US20210264560A1

    公开(公告)日:2021-08-26

    申请号:US17319301

    申请日:2021-05-13

    Abstract: The disclosed technology generally relates to a graphics processing unit (GPU). In one aspect, a GPU includes a general purpose register (GPR) having registers, an arithmetic logic unit (ALU) configured to read pixels of an image independently of a shared memory, and a level 1 (L1) cache storing the pixels read by the ALU. The ALU can implement pixel mapping by fetching a quad of pixels, which includes pixels of first, second, third, and fourth pixel types, from the L1 cache, grouping the pixels of the different pixel types of the quad into four groups based on pixel type, and, for each group, separating the pixels included in the group into three regions that each have a set of pixels. The pixels for each group can then be loaded into the registers corresponding to the three regions.

Patent Agency Ranking