-
公开(公告)号:US20250103529A1
公开(公告)日:2025-03-27
申请号:US18970570
申请日:2024-12-05
Applicant: NVIDIA Corporation
Inventor: Ahmad Itani , Yen-Te Shih , Jagadeesh Sankaran , Ravi P. Singh , Ching-Yu Hung
IPC: G06F13/28
Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
-
公开(公告)号:US11961243B2
公开(公告)日:2024-04-16
申请号:US17187228
申请日:2021-02-26
Applicant: NVIDIA Corporation
Inventor: Dong Zhang , Sangmin Oh , Junghyun Kwon , Baris Evrim Demiroz , Tae Eun Choe , Minwoo Park , Chethan Ningaraju , Hao Tsui , Eric Viscito , Jagadeesh Sankaran , Yongqing Liang
IPC: G06T7/00 , B60W60/00 , G06F18/214 , G06N3/08 , G06T7/246 , G06V10/25 , G06V10/75 , G06V20/58 , G06V20/56
CPC classification number: G06T7/246 , B60W60/001 , G06F18/2148 , G06N3/08 , G06V10/25 , G06V10/751 , G06V20/58 , G06V20/56
Abstract: A geometric approach may be used to detect objects on a road surface. A set of points within a region of interest between a first frame and a second frame are captured and tracked to determine a difference in location between the set of points in two frames. The first frame may be aligned with the second frame and the first pixel values of the first frame may be compared with the second pixel values of the second frame to generate a disparity image including third pixels. One or more subsets of the third pixels that have a value above a first threshold may be combined, and the third pixels may be scored and associated with disparity values for each pixel of the one or more subsets of the third pixels. A bounding shape may be generated based on the scoring.
-
3.
公开(公告)号:US20230185569A1
公开(公告)日:2023-06-15
申请号:US18064119
申请日:2022-12-09
Applicant: NVIDIA Corporation
Inventor: Ahmad Itani , Yen-Te Shih , Jagadeesh Sankaran , Ravi P. Singh , Ching-Yu Hung
CPC classification number: G06F9/3004 , G06F13/28 , G06F15/8061 , G06F9/30036
Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
-
公开(公告)号:US11630800B2
公开(公告)日:2023-04-18
申请号:US15141703
申请日:2016-04-28
Applicant: NVIDIA Corporation
Inventor: Ching Y. Hung , Jagadeesh Sankaran , Ravi P. Singh , Stanley Tzeng
Abstract: In one embodiment of the present invention, a programmable vision accelerator enables applications to collapse multi-dimensional loops into one dimensional loops. In general, configurable components included in the programmable vision accelerator work together to facilitate such loop collapsing. The configurable elements include multi-dimensional address generators, vector units, and load/store units. Each multi-dimensional address generator generates a different address pattern. Each address pattern represents an overall addressing sequence associated with an object accessed within the collapsed loop. The vector units and the load store units provide execution functionality typically associated with multi-dimensional loops based on the address pattern. Advantageously, collapsing multi-dimensional loops in a flexible manner dramatically reduces the overhead associated with implementing a wide range of computer vision algorithms. Consequently, the overall performance of many computer vision applications may be optimized.
-
公开(公告)号:US20230050062A1
公开(公告)日:2023-02-16
申请号:US17391395
申请日:2021-08-02
Applicant: NVIDIA Corporation
Inventor: Ching-Yu Hung , Ravi P. Singh , Jagadeesh Sankaran , Yen-Te Shih , Ahmad Itani
Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
-
公开(公告)号:US20230048836A1
公开(公告)日:2023-02-16
申请号:US17391875
申请日:2021-08-02
Applicant: NVIDIA Corporation
Inventor: Ahmad Itani , Yen-Te Shih , Jagadeesh Sankaran , Ravi P Singh , Ching-Yu Hung
Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
-
公开(公告)号:US20230042226A1
公开(公告)日:2023-02-09
申请号:US17391867
申请日:2021-08-02
Applicant: NVIDIA Corporation
Inventor: Ahmad Itani , Yen-Te Shih , Jagadeesh Sankaran , Ravi P. Singh , Ching-Yu Hung
IPC: G06F13/28
Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
-
公开(公告)号:US12204475B2
公开(公告)日:2025-01-21
申请号:US18064121
申请日:2022-12-09
Applicant: NVIDIA Corporation
Inventor: Ahmad Itani , Yen-Te Shih , Jagadeesh Sankaran , Ravi P Singh , Ching-Yu Hung
IPC: G06F13/28
Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
-
公开(公告)号:US12118353B2
公开(公告)日:2024-10-15
申请号:US17391491
申请日:2021-08-02
Applicant: NVIDIA Corporation
Inventor: Ching-Yu Hung , Ravi P Singh , Jagadeesh Sankaran , Yen-Te Shih , Ahmad Itani
CPC classification number: G06F9/30036 , G06F9/30101 , G06F9/3887
Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
-
公开(公告)号:US12050548B2
公开(公告)日:2024-07-30
申请号:US18068819
申请日:2022-12-20
Applicant: NVIDIA Corporation
Inventor: Ahmad Itani , Yen-Te Shih , Jagadeesh Sankaran , Ravi P Singh , Ching-Yu Hung
CPC classification number: G06F15/8053 , G06F9/30101 , G06F9/3887 , G06F13/28 , G06F21/64 , H03M13/09 , G06T1/20 , H03M13/091
Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.
-
-
-
-
-
-
-
-
-