-
公开(公告)号:US11379944B2
公开(公告)日:2022-07-05
申请号:US16910029
申请日:2020-06-23
Applicant: NVIDIA CORPORATION
Inventor: Michael Fetterman , Shirish Gadre , Mark Gebhart , Steven J. Heinrich , Ramesh Jandhyala , William Newhall , Omkar Paranjape , Stefano Pescador , Poorna Rao
IPC: G06T1/20 , G06T1/60 , G06F16/245
Abstract: A texture processing pipeline in a graphics processing unit generates the surface appearance for objects in a computer-generated scene. This texture processing pipeline determines, at multiple stages within the texture processing pipeline, whether texture operations and texture loads may be processed at an accelerated rate. At each stage that includes a decision point, the texture processing pipeline assumes that the current texture operation or texture load can be accelerated unless specific, known information indicates that the texture operation or texture load cannot be accelerated. As a result, the texture processing pipeline increases the number of texture operations and texture loads that are accelerated relative to the number of texture operations and texture loads that are not accelerated.
-
公开(公告)号:US11080051B2
公开(公告)日:2021-08-03
申请号:US16712083
申请日:2019-12-12
Applicant: NVIDIA Corporation
Inventor: Andrew Kerr , Jack Choquette , Xiaogang Qiu , Omkar Paranjape , Poornachandra Rao , Shirish Gadre , Steven J. Heinrich , Manan Patel , Olivier Giroux , Alan Kaatz
IPC: G06F9/30 , G06F12/0808 , G06F12/0888
Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.
-
公开(公告)号:US20230185570A1
公开(公告)日:2023-06-15
申请号:US18107374
申请日:2023-02-08
Applicant: NVIDIA Corporation
Inventor: Andrew KERR , Jack Choquette , Xiaogang Qiu , Omkar Paranjape , Poornachandra Rao , Shirish Gadre , Steven J. Heinrich , Manan Patel , Olivier Giroux , Alan Kaatz
IPC: G06F9/30 , G06F12/0808 , G06F12/0888 , G06F9/32 , G06F9/38 , G06F9/52 , G06F9/54
CPC classification number: G06F9/30043 , G06F12/0808 , G06F12/0888 , G06F9/3009 , G06F9/321 , G06F9/3871 , G06F9/522 , G06F9/542 , G06F9/544 , G06F9/546 , G06F9/3838 , G06F2212/621 , G06F9/3004
Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.
-
公开(公告)号:US11347668B2
公开(公告)日:2022-05-31
申请号:US16921795
申请日:2020-07-06
Applicant: NVIDIA Corporation
Inventor: Xiaogang Qiu , Ronny Krashinsky , Steven Heinrich , Shirish Gadre , John Edmondson , Jack Choquette , Mark Gebhart , Ramesh Jandhyala , Poornachandra Rao , Omkar Paranjape , Michael Siu
IPC: G06F13/28 , G06F12/0891 , G06F12/0811 , G06F12/084 , G06F12/0895 , G06F12/122 , G11C7/10
Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.
-
公开(公告)号:US10459861B2
公开(公告)日:2019-10-29
申请号:US15716461
申请日:2017-09-26
Applicant: NVIDIA Corporation
Inventor: Xiaogang Qiu , Ronny Krashinsky , Steven Heinrich , Shirish Gadre , John Edmondson , Jack Choquette , Mark Gebhart , Ramesh Jandhyala , Poornachandra Rao , Omkar Paranjape , Michael Siu
IPC: G06F12/02 , G06F13/28 , G06F12/0891 , G06F12/0811 , G06F12/084
Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.
-
公开(公告)号:US11907717B2
公开(公告)日:2024-02-20
申请号:US18107374
申请日:2023-02-08
Applicant: NVIDIA Corporation
Inventor: Andrew Kerr , Jack Choquette , Xiaogang Qiu , Omkar Paranjape , Poornachandra Rao , Shirish Gadre , Steven J. Heinrich , Manan Patel , Olivier Giroux , Alan Kaatz
IPC: G06F9/30 , G06F9/52 , G06F12/0808 , G06F12/0888
CPC classification number: G06F9/30043 , G06F9/3009 , G06F9/522 , G06F12/0808 , G06F12/0888 , G06F9/3004
Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.
-
公开(公告)号:US11604649B2
公开(公告)日:2023-03-14
申请号:US17363561
申请日:2021-06-30
Applicant: NVIDIA Corporation
Inventor: Andrew Kerr , Jack Choquette , Xiaogang Qiu , Omkar Paranjape , Poornachandra Rao , Shirish Gadre , Steven J. Heinrich , Manan Patel , Olivier Giroux , Alan Kaatz
IPC: G06F9/30 , G06F12/0808 , G06F12/0888 , G06F9/32 , G06F9/38 , G06F9/52 , G06F9/54
Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.
-
公开(公告)号:US20210124582A1
公开(公告)日:2021-04-29
申请号:US16712083
申请日:2019-12-12
Applicant: NVIDIA Corporation
Inventor: Andrew Kerr , Jack Choquette , Xiaogang Qiu , Omkar Paranjape , Poornachandra Rao , Shirish Gadre , Steven J. Heinrich , Manan Patel , Olivier Giroux , Alan Kaatz
IPC: G06F9/30 , G06F12/0888 , G06F12/0808
Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.
-
公开(公告)号:US10705994B2
公开(公告)日:2020-07-07
申请号:US15587213
申请日:2017-05-04
Applicant: NVIDIA Corporation
Inventor: Xiaogang Qiu , Ronny Krashinsky , Steven Heinrich , Shirish Gadre , John Edmondson , Jack Choquette , Mark Gebhart , Ramesh Jandhyala , Poornachandra Rao , Omkar Paranjape , Michael Siu
IPC: G06F12/084 , G06F13/28 , G06F12/0891 , G06F12/0811 , G06F12/0895 , G06F12/122 , G11C7/10
Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.
-
公开(公告)号:US11768686B2
公开(公告)日:2023-09-26
申请号:US16940363
申请日:2020-07-27
Applicant: NVIDIA Corporation
Inventor: Michael A Fetterman , Mark Gebhart , Shirish Gadre , Mitchell Hayenga , Steven Heinrich , Ramesh Jandhyala , Raghavan Madhavan , Omkar Paranjape , James Robertson , Jeff Schottmiller
IPC: G06F9/38 , G06F9/30 , G06F12/084 , G06F12/0873 , G06F9/54 , G06F12/0842 , G06F12/0846 , G06F5/06
CPC classification number: G06F9/3836 , G06F5/065 , G06F9/30047 , G06F9/3867 , G06F9/546 , G06F12/084 , G06F12/0842 , G06F12/0846 , G06F12/0873 , G06F2212/1021
Abstract: In a streaming cache, multiple, dynamically sized tracking queues are employed. Request tracking information is distributed among the plural tracking queues to selectively enable out-of-order memory request returns. A dynamically controlled policy assigns pending requests to tracking queues, providing for example in-order memory returns in some contexts and/or for some traffic and out of order memory returns in other contexts and/or for other traffic.
-
-
-
-
-
-
-
-
-