Patent search ap:("Nvidia Corporation") AND inv:"Jagadeesh Sankaran" Page 1

1.

发明申请
USING A HARDWARE SEQUENCER IN A DIRECT MEMORY ACCESS SYSTEM OF A SYSTEM ON A CHIP 有权

公开(公告)号：US20250103529A1

公开(公告)日：2025-03-27

申请号：US18970570

申请日：2024-12-05

Applicant: NVIDIA Corporation

Inventor： Ahmad Itani , Yen-Te Shih , Jagadeesh Sankaran , Ravi P. Singh , Ching-Yu Hung

IPC: G06F13/28

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

2.

发明授权
Object detection using image alignment for autonomous machine applications 有权

公开(公告)号：US11961243B2

公开(公告)日：2024-04-16

申请号：US17187228

申请日：2021-02-26

Applicant: NVIDIA Corporation

Inventor： Dong Zhang , Sangmin Oh , Junghyun Kwon , Baris Evrim Demiroz , Tae Eun Choe , Minwoo Park , Chethan Ningaraju , Hao Tsui , Eric Viscito , Jagadeesh Sankaran , Yongqing Liang

IPC: G06T7/00 , B60W60/00 , G06F18/214 , G06N3/08 , G06T7/246 , G06V10/25 , G06V10/75 , G06V20/58 , G06V20/56

CPC classification number: G06T7/246 , B60W60/001 , G06F18/2148 , G06N3/08 , G06V10/25 , G06V10/751 , G06V20/58 , G06V20/56

Abstract: A geometric approach may be used to detect objects on a road surface. A set of points within a region of interest between a first frame and a second frame are captured and tracked to determine a difference in location between the set of points in two frames. The first frame may be aligned with the second frame and the first pixel values of the first frame may be compared with the second pixel values of the second frame to generate a disparity image including third pixels. One or more subsets of the third pixels that have a value above a first threshold may be combined, and the third pixels may be scored and associated with disparity values for each pixel of the one or more subsets of the third pixels. A bounding shape may be generated based on the scoring.

3.

发明公开
USING A VECTOR PROCESSOR TO CONFIGURE A DIRECT MEMORY ACCESS SYSTEM FOR FEATURE TRACKING OPERATIONS IN A SYSTEM ON A CHIP 审中-公开

公开(公告)号：US20230185569A1

公开(公告)日：2023-06-15

申请号：US18064119

申请日：2022-12-09

Applicant: NVIDIA Corporation

Inventor： Ahmad Itani , Yen-Te Shih , Jagadeesh Sankaran , Ravi P. Singh , Ching-Yu Hung

IPC: G06F9/30 , G06F13/28 , G06F15/80

CPC classification number: G06F9/3004 , G06F13/28 , G06F15/8061 , G06F9/30036

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

4.

发明授权
Programmable vision accelerator 有权

公开(公告)号：US11630800B2

公开(公告)日：2023-04-18

申请号：US15141703

申请日：2016-04-28

Applicant: NVIDIA Corporation

Inventor： Ching Y. Hung , Jagadeesh Sankaran , Ravi P. Singh , Stanley Tzeng

IPC: G06F9/30 , G06F15/82 , G06F9/32 , G06F9/38 , G06F9/345 , G06F9/34 , G06F12/02 , G06F15/80

Abstract: In one embodiment of the present invention, a programmable vision accelerator enables applications to collapse multi-dimensional loops into one dimensional loops. In general, configurable components included in the programmable vision accelerator work together to facilitate such loop collapsing. The configurable elements include multi-dimensional address generators, vector units, and load/store units. Each multi-dimensional address generator generates a different address pattern. Each address pattern represents an overall addressing sequence associated with an object accessed within the collapsed loop. The vector units and the load store units provide execution functionality typically associated with multi-dimensional loops based on the address pattern. Advantageously, collapsing multi-dimensional loops in a flexible manner dramatically reduces the overhead associated with implementing a wide range of computer vision algorithms. Consequently, the overall performance of many computer vision applications may be optimized.

5.

发明申请
SIMD DATA PATH ORGANIZATION TO INCREASE PROCESSING THROUGHPUT IN A SYSTEM ON A CHIP 有权

公开(公告)号：US20230050062A1

公开(公告)日：2023-02-16

申请号：US17391395

申请日：2021-08-02

Applicant: NVIDIA Corporation

Inventor： Ching-Yu Hung , Ravi P. Singh , Jagadeesh Sankaran , Yen-Te Shih , Ahmad Itani

IPC: G06F9/30 , G06F9/38

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

6.

发明申请
USING A VECTOR PROCESSOR TO CONFIGURE A DIRECT MEMORY ACCESS SYSTEM FOR FEATURE TRACKING OPERATIONS IN A SYSTEM ON A CHIP 有权

公开(公告)号：US20230048836A1

公开(公告)日：2023-02-16

申请号：US17391875

申请日：2021-08-02

Applicant: NVIDIA Corporation

Inventor： Ahmad Itani , Yen-Te Shih , Jagadeesh Sankaran , Ravi P Singh , Ching-Yu Hung

IPC: G06F9/30 , G06F15/80 , G06F13/28

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

7.

发明申请
USING A HARDWARE SEQUENCER IN A DIRECT MEMORY ACCESS SYSTEM OF A SYSTEM ON A CHIP 有权

公开(公告)号：US20230042226A1

公开(公告)日：2023-02-09

申请号：US17391867

申请日：2021-08-02

Applicant: NVIDIA Corporation

Inventor： Ahmad Itani , Yen-Te Shih , Jagadeesh Sankaran , Ravi P. Singh , Ching-Yu Hung

IPC: G06F13/28

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

8.

发明授权
Using a hardware sequencer in a direct memory access system of a system on a chip 有权

公开(公告)号：US12204475B2

公开(公告)日：2025-01-21

申请号：US18064121

申请日：2022-12-09

Applicant: NVIDIA Corporation

Inventor： Ahmad Itani , Yen-Te Shih , Jagadeesh Sankaran , Ravi P Singh , Ching-Yu Hung

IPC: G06F13/28

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

9.

发明授权
Performing load and permute with a single instruction in a system on a chip 有权

公开(公告)号：US12118353B2

公开(公告)日：2024-10-15

申请号：US17391491

申请日：2021-08-02

Applicant: NVIDIA Corporation

Inventor： Ching-Yu Hung , Ravi P Singh , Jagadeesh Sankaran , Yen-Te Shih , Ahmad Itani

IPC: G06F9/30 , G06F9/38

CPC classification number: G06F9/30036 , G06F9/30101 , G06F9/3887

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

10.

发明授权
Built-in self-test for a programmable vision accelerator of a system on a chip 有权

公开(公告)号：US12050548B2

公开(公告)日：2024-07-30

申请号：US18068819

申请日：2022-12-20

Applicant: NVIDIA Corporation

Inventor： Ahmad Itani , Yen-Te Shih , Jagadeesh Sankaran , Ravi P Singh , Ching-Yu Hung

IPC: H03M13/00 , G06F9/30 , G06F9/38 , G06F13/28 , G06F15/80 , G06F21/64 , H03M13/09 , G06T1/20

CPC classification number: G06F15/8053 , G06F9/30101 , G06F9/3887 , G06F13/28 , G06F21/64 , H03M13/09 , G06T1/20 , H03M13/091

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification