Patent search ap:("Nvidia Corporation") AND inv:"Jagadeesh Sankaran" Page 4

31.

发明申请
PERFORMING LOAD AND PERMUTE WITH A SINGLE INSTRUCTION IN A SYSTEM ON A CHIP 有权

公开(公告)号：US20230076599A1

公开(公告)日：2023-03-09

申请号：US17391491

申请日：2021-08-02

Applicant: NVIDIA Corporation

Inventor： Ching-Yu Hung , Ravi P. Singh , Jagadeesh Sankaran , Yen-Te Shih , Ahmad Itani

IPC: G06F9/30 , G06F9/38

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

32.

发明授权
Built-in self-test for a programmable vision accelerator of a system on a chip 有权

公开(公告)号：US11573921B1

公开(公告)日：2023-02-07

申请号：US17391891

申请日：2021-08-02

Applicant: NVIDIA Corporation

Inventor： Ahmad Itani , Yen-Te Shih , Jagadeesh Sankaran , Ravi P Singh , Ching-Yu Hung

IPC: G06F15/80 , G06F9/30 , G06F9/38 , G06F13/28 , G06F21/64 , H03M13/09 , G06T1/20

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

33.

发明授权
Using a vector processor to configure a direct memory access system for feature tracking operations in a system on a chip 有权

公开(公告)号：US11934829B2

公开(公告)日：2024-03-19

申请号：US18064119

申请日：2022-12-09

Applicant: NVIDIA Corporation

Inventor： Ahmad Itani , Yen-Te Shih , Jagadeesh Sankaran , Ravi P Singh , Ching-Yu Hung

IPC: G06F16/435 , G06F9/30 , G06F13/28 , G06F15/80 , G06F16/9537

CPC classification number: G06F9/3004 , G06F9/30036 , G06F13/28 , G06F15/8061

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

34.

发明公开
ACCELERATING TABLE LOOKUPS USING A DECOUPLED LOOKUP TABLE ACCELERATOR IN A SYSTEM ON A CHIP 审中-公开

公开(公告)号：US20240045722A1

公开(公告)日：2024-02-08

申请号：US18488674

申请日：2023-10-17

Applicant: NVIDIA Corporation

Inventor： Ravi P. Singh , Ching-Yu Hung , Jagadeesh Sankaran , Ahmad Itani , Yen-Te Shih

IPC: G06F9/50 , G06F7/76 , G06F1/03

CPC classification number: G06F9/5027 , G06F7/76 , G06F1/03 , G06F9/5077

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

35.

发明授权
Using a hardware sequencer in a direct memory access system of a system on a chip 有权

公开(公告)号：US11593290B1

公开(公告)日：2023-02-28

申请号：US17391867

申请日：2021-08-02

Applicant: NVIDIA Corporation

Inventor： Ahmad Itani , Yen-Te Shih , Jagadeesh Sankaran , Ravi P Singh , Ching-Yu Hung

IPC: G06F13/28

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

36.

发明申请
ACCELERATING TABLE LOOKUPS USING A DECOUPLED LOOKUP TABLE ACCELERATOR IN A SYSTEM ON A CHIP 有权

公开(公告)号：US20230050902A1

公开(公告)日：2023-02-16

申请号：US17391369

申请日：2021-08-02

Applicant: NVIDIA Corporation

Inventor： Ravi P Singh , Ching-Yu Hung , Jagadeesh Sankaran , Ahmad Itani , Yen-Te Shih

IPC: G06F9/50 , G06F1/03 , G06F7/76

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

37.

发明申请
HARDWARE ACCELERATED ANOMALY DETECTION IN A SYSTEM ON A CHIP 有权

公开(公告)号：US20230046642A1

公开(公告)日：2023-02-16

申请号：US17391425

申请日：2021-08-02

Applicant: NVIDIA Corporation

Inventor： Ching-Yu Hung , Ravi P Singh , Jagadeesh Sankaran , Yen-Te Shih , Ahmad Itani

IPC: G06F15/80 , G06F15/78 , G06N3/10

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

38.

发明申请
PERFORMING LOAD AND STORE OPERATIONS OF 2D ARRAYS IN A SINGLE CYCLE IN A SYSTEM ON A CHIP 有权

公开(公告)号：US20230045443A1

公开(公告)日：2023-02-09

申请号：US17391468

申请日：2021-08-02

Applicant: NVIDIA Corporation

Inventor： Ching-Yu Hung , Ravi P. Singh , Jagadeesh Sankaran , Yen-Te Shih , Ahmad Itani

IPC: G06F12/02 , G06F12/1081 , G06F9/30 , G06F13/28

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

39.

发明申请
OFFLOADING PROCESSING TASKS TO DECOUPLED ACCELERATORS FOR INCREASING PERFORMANCE IN A SYSTEM ON A CHIP 有权

公开(公告)号：US20230042858A1

公开(公告)日：2023-02-09

申请号：US17391320

申请日：2021-08-02

Applicant: NVIDIA Corporation

Inventor： Ravi P. Singh , Ching-Yu Hung , Jagadeesh Sankaran , Ahmad Itani , Yen-Te Shih

IPC: G06F9/48 , G06F15/80 , G06F9/30

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification