-
1.
公开(公告)号:US20230185569A1
公开(公告)日:2023-06-15
申请号:US18064119
申请日:2022-12-09
Applicant: NVIDIA Corporation
Inventor: Ahmad Itani , Yen-Te Shih , Jagadeesh Sankaran , Ravi P. Singh , Ching-Yu Hung
CPC classification number: G06F9/3004 , G06F13/28 , G06F15/8061 , G06F9/30036
Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
-
公开(公告)号:US20230050062A1
公开(公告)日:2023-02-16
申请号:US17391395
申请日:2021-08-02
Applicant: NVIDIA Corporation
Inventor: Ching-Yu Hung , Ravi P. Singh , Jagadeesh Sankaran , Yen-Te Shih , Ahmad Itani
Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
-
公开(公告)号:US20230048836A1
公开(公告)日:2023-02-16
申请号:US17391875
申请日:2021-08-02
Applicant: NVIDIA Corporation
Inventor: Ahmad Itani , Yen-Te Shih , Jagadeesh Sankaran , Ravi P Singh , Ching-Yu Hung
Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
-
公开(公告)号:US20230042226A1
公开(公告)日:2023-02-09
申请号:US17391867
申请日:2021-08-02
Applicant: NVIDIA Corporation
Inventor: Ahmad Itani , Yen-Te Shih , Jagadeesh Sankaran , Ravi P. Singh , Ching-Yu Hung
IPC: G06F13/28
Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
-
公开(公告)号:US12093539B2
公开(公告)日:2024-09-17
申请号:US18069722
申请日:2022-12-21
Applicant: NVIDIA Corporation
Inventor: Ching-Yu Hung , Ravi P Singh , Jagadeesh Sankaran , Yen-Te Shih , Ahmad Itani
IPC: G06F12/00 , G06F3/06 , G06F12/0802
CPC classification number: G06F3/0625 , G06F3/0655 , G06F3/0679 , G06F12/0802
Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
-
6.
公开(公告)号:US20240134645A1
公开(公告)日:2024-04-25
申请号:US18402519
申请日:2024-01-02
Applicant: NVIDIA Corporation
Inventor: Ahmad Itani , Yen-Te Shih , Jagadeesh Sankaran , Ravi P Singh , Ching-Yu Hung
CPC classification number: G06F9/3004 , G06F9/30036 , G06F13/28 , G06F15/8061
Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
-
公开(公告)号:US11940947B2
公开(公告)日:2024-03-26
申请号:US18151009
申请日:2023-01-06
Applicant: NVIDIA Corporation
Inventor: Ching-Yu Hung , Ravi P. Singh , Jagadeesh Sankaran , Yen-Te Shih , Ahmad Itani
CPC classification number: G06F15/8053 , G06F15/7807 , G06F15/8007 , G06N3/063 , G06N3/10
Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
-
公开(公告)号:US20230125397A1
公开(公告)日:2023-04-27
申请号:US18068819
申请日:2022-12-20
Applicant: NVIDIA Corporation
Inventor: Ahmad Itani , Yen-Te Shih , Jagadeesh Sankaran , Ravi P. Singh , Ching-Yu Hung
Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
-
公开(公告)号:US11636063B2
公开(公告)日:2023-04-25
申请号:US17391425
申请日:2021-08-02
Applicant: NVIDIA Corporation
Inventor: Ching-Yu Hung , Ravi P Singh , Jagadeesh Sankaran , Yen-Te Shih , Ahmad Itani
Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
-
公开(公告)号:US20230111014A1
公开(公告)日:2023-04-13
申请号:US18064121
申请日:2022-12-09
Applicant: NVIDIA Corporation
Inventor: Ahmad Itani , Yen-Te Shih , Jagadeesh Sankaran , Ravi P. Singh , Ching-Yu Hung
IPC: G06F13/28
Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
-
-
-
-
-
-
-
-
-