专利检索 ap:("NVIDIA Corporation") AND inv:"Jagadeesh Sankaran" 第 1 页

1.

发明授权
Programmable vision accelerator 有权

公开(公告)号：US11630800B2

公开(公告)日：2023-04-18

申请号：US15141703

申请日：2016-04-28

申请人： NVIDIA Corporation

发明人： Ching Y. Hung , Jagadeesh Sankaran , Ravi P. Singh , Stanley Tzeng

IPC分类号： G06F9/30 , G06F15/82 , G06F9/32 , G06F9/38 , G06F9/345 , G06F9/34 , G06F12/02 , G06F15/80

摘要： In one embodiment of the present invention, a programmable vision accelerator enables applications to collapse multi-dimensional loops into one dimensional loops. In general, configurable components included in the programmable vision accelerator work together to facilitate such loop collapsing. The configurable elements include multi-dimensional address generators, vector units, and load/store units. Each multi-dimensional address generator generates a different address pattern. Each address pattern represents an overall addressing sequence associated with an object accessed within the collapsed loop. The vector units and the load store units provide execution functionality typically associated with multi-dimensional loops based on the address pattern. Advantageously, collapsing multi-dimensional loops in a flexible manner dramatically reduces the overhead associated with implementing a wide range of computer vision algorithms. Consequently, the overall performance of many computer vision applications may be optimized.

2.

发明申请
SIMD DATA PATH ORGANIZATION TO INCREASE PROCESSING THROUGHPUT IN A SYSTEM ON A CHIP 有权

公开(公告)号：US20230050062A1

公开(公告)日：2023-02-16

申请号：US17391395

申请日：2021-08-02

申请人： NVIDIA Corporation

发明人： Ching-Yu Hung , Ravi P. Singh , Jagadeesh Sankaran , Yen-Te Shih , Ahmad Itani

IPC分类号： G06F9/30 , G06F9/38

摘要： In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

3.

发明申请
USING A VECTOR PROCESSOR TO CONFIGURE A DIRECT MEMORY ACCESS SYSTEM FOR FEATURE TRACKING OPERATIONS IN A SYSTEM ON A CHIP 有权

公开(公告)号：US20230048836A1

公开(公告)日：2023-02-16

申请号：US17391875

申请日：2021-08-02

申请人： NVIDIA Corporation

发明人： Ahmad Itani , Yen-Te Shih , Jagadeesh Sankaran , Ravi P Singh , Ching-Yu Hung

IPC分类号： G06F9/30 , G06F15/80 , G06F13/28

摘要： In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

4.

发明申请
USING A HARDWARE SEQUENCER IN A DIRECT MEMORY ACCESS SYSTEM OF A SYSTEM ON A CHIP 有权

公开(公告)号：US20230042226A1

公开(公告)日：2023-02-09

申请号：US17391867

申请日：2021-08-02

申请人： NVIDIA Corporation

发明人： Ahmad Itani , Yen-Te Shih , Jagadeesh Sankaran , Ravi P. Singh , Ching-Yu Hung

IPC分类号： G06F13/28

摘要： In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

5.

发明申请
USING PER MEMORY BANK LOAD CACHES FOR REDUCING POWER USE IN A SYSTEM ON A CHIP 有权

公开(公告)号：US20230124604A1

公开(公告)日：2023-04-20

申请号：US18069722

申请日：2022-12-21

申请人： NVIDIA Corporation

发明人： Ching-Yu Hung , Ravi P Singh , Jagadeesh Sankaran , Yen-Te Shih , Ahmad Itani

IPC分类号： G06F3/06 , G06F12/0802

摘要： In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

6.

发明申请
REDUCED MEMORY WRITE REQUIREMENTS IN A SYSTEM ON A CHIP USING AUTOMATIC STORE PREDICATION 有权

公开(公告)号：US20230049442A1

公开(公告)日：2023-02-16

申请号：US17391374

申请日：2021-08-02

申请人： NVIDIA Corporation

发明人： Ching-Yu Hung , Ravi P. Singh , Jagadeesh Sankaran , Yen-Te Shih , Ahmad Itani

IPC分类号： G06F9/38 , G06F9/30 , G06F9/455

摘要： In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

7.

发明申请
BUILT-IN SELF-TEST FOR A PROGRAMMABLE VISION ACCELERATOR OF A SYSTEM ON A CHIP 有权

公开(公告)号：US20230037738A1

公开(公告)日：2023-02-09

申请号：US17391891

申请日：2021-08-02

申请人： NVIDIA Corporation

发明人： Ahmad Itani , Yen-Te Shih , Jagadeesh Sankaran , Ravi P. Singh , Ching-Yu Hung

IPC分类号： G06F15/80 , G06F13/28 , G06F9/38 , G06F9/30

摘要： In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

8.

发明授权
Using a vector processor to configure a direct memory access system for feature tracking operations in a system on a chip 有权

公开(公告)号：US11573795B1

公开(公告)日：2023-02-07

申请号：US17391875

申请日：2021-08-02

申请人： NVIDIA Corporation

发明人： Ahmad Itani , Yen-Te Shih , Jagadeesh Sankaran , Ravi P Singh , Ching-Yu Hung

IPC分类号： G06F17/16 , G06F3/042 , G06F9/30 , G06F13/28 , G06F15/80

摘要： In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

9.

发明申请
HYBRID SOLUTION FOR STEREO IMAGING 有权

公开(公告)号：US20230130478A1

公开(公告)日：2023-04-27

申请号：US17798232

申请日：2020-06-22

申请人： Nvidia Corporation

发明人： Dong Zhang , Eric Viscito , Frans Sijstermans , Jagadeesh Sankaran , Ching Hung , Yen-Te Shih , Ravi Singh

IPC分类号： G06T7/593

摘要： A hybrid matching approach can be used for computer vision that balances accuracy with speed and resource consumption. Stereoscopic image data can be rectified and downsampled, then analyzed using a semi-global matching (SGM) process. The use of downsampled images greatly reduces time and bandwidth requirements, while providing high accuracy disparity results. These disparity results can be provided as external hints to a fast module that can perform a robust matching process in the time needed for applications such as real time navigation. The external hints can be used, along with potentially other hints, to define a search space for use by the fast module, which can result in higher quality disparity results obtained within specified timing constraints and with limited resources. The disparity results can be used to determine distances to various objects, as may be important for vehicle navigation or robotic task performance.

10.

发明授权
Using per memory bank load caches for reducing power use in a system on a chip 有权

公开(公告)号：US11593001B1

公开(公告)日：2023-02-28

申请号：US17391861

申请日：2021-08-02

申请人： NVIDIA Corporation

发明人： Ching-Yu Hung , Ravi P Singh , Jagadeesh Sankaran , Yen-Te Shih , Ahmad Itani

IPC分类号： G06F12/00 , G06F3/06 , G06F12/0802

摘要： A VPU and associated components include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators are used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer is included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU executes a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

搜索结果

国家/区域

专利有效性

申请日

公布(公告)日

申请人

申请人所在国/区域

发明人

IPC

IPC部

IPC大类

IPC小类

IPC大组

IPC小组

外观分类