-
公开(公告)号:US10180916B2
公开(公告)日:2019-01-15
申请号:US14958714
申请日:2015-12-03
Applicant: Nvidia Corporation
Inventor: M. Wasiur Rashid , Gary Ward , Wei-Je Robert Huang , Philip Browning Johnson
Abstract: A copy subsystem within a processor includes a set of logical copy engines and a set of physical copy engines. Each logical copy engine corresponds to a different command stream implemented by a device driver, and each logical copy engine is configured to receive copy commands via the corresponding command stream. When a logical copy engine receives a copy command, the logical copy engine distributes the command, or one or more subcommands derived from the command, to one or more of the physical copy engines. The physical copy engines can perform multiple copy operations in parallel with one another, thereby allowing the bandwidth of the communication link(s) to be saturated.
-
公开(公告)号:US10430356B2
公开(公告)日:2019-10-01
申请号:US15582459
申请日:2017-04-28
Applicant: NVIDIA Corporation
Inventor: M. Wasiur Rashid , Jonathon Evans , Gary Ward , Philip Browning Johnson
IPC: G06F3/06 , G06F13/28 , G06F12/109 , G06F13/40
Abstract: Embodiments of the present invention set forth techniques for resolving page faults associated with a copy engine. A copy engine within a parallel processor receives a copy operation that includes a set of copy commands. The copy engine executes a first copy command included in the set of copy commands that results in a page fault. The copy engine stores the set of copy commands to the memory. At least one advantage of the disclosed techniques is that the copy engine can perform copy operations that involve source and destination memory pages that are not pinned, leading to reduced memory demand and greater flexibility.
-
公开(公告)号:US10275275B2
公开(公告)日:2019-04-30
申请号:US14958719
申请日:2015-12-03
Applicant: Nvidia Corporation
Inventor: M. Wasiur Rashid , Gary Ward , Wei-Je Robert Huang , Philip Browning Johnson
Abstract: A copy subsystem within a processor includes a set of logical copy engines and a set of physical copy engines. Each logical copy engine corresponds to a different command stream implemented by a device driver, and each logical copy engine is configured to receive copy commands via the corresponding command stream. When a logical copy engine receives a copy command, the logical copy engine distributes the command, or one or more subcommands derived from the command, to one or more of the physical copy engines. The physical copy engines can perform multiple copy operations in parallel with one another, thereby allowing the bandwidth of the communication link(s) to be saturated.
-
公开(公告)号:US20180314431A1
公开(公告)日:2018-11-01
申请号:US15582459
申请日:2017-04-28
Applicant: NVIDIA Corporation
Inventor: M. Wasiur Rashid , Jonathon EVANS , Gary Ward , Philip Browning Johnson
IPC: G06F3/06
CPC classification number: G06F13/28 , G06F12/109 , G06F13/4022 , G06F2212/1041
Abstract: Embodiments of the present invention set forth techniques for resolving page faults associated with a copy engine. A copy engine within a parallel processor receives a copy operation that includes a set of copy commands. The copy engine executes a first copy command included in the set of copy commands that results in a page fault. The copy engine stores the set of copy commands to the memory. At least one advantage of the disclosed techniques is that the copy engine can perform copy operations that involve source and destination memory pages that are not pinned, leading to reduced memory demand and greater flexibility.
-
公开(公告)号:US10095526B2
公开(公告)日:2018-10-09
申请号:US13651131
申请日:2012-10-12
Applicant: NVIDIA Corporation
Inventor: Samuel H. Duncan , Gary Ward , M. Wasiur Rashid , Lincoln G. Garlick , Wojciech Jan Truty
Abstract: A multi-threaded processing unit includes a hardware pre-processor coupled to one or more processing engines (e.g., copy engines, GPCs, etc.) that implement pre-emption techniques by dividing tasks into smaller subtasks and scheduling subtasks on the processing engines based on the priority of the tasks. By limiting the size of the subtasks, higher priority tasks may be executed quickly without switching the context state of the processing engine. Tasks may be subdivided based on a threshold size or by taking into account other consideration such as physical boundaries of the memory system.
-
公开(公告)号:US20170161100A1
公开(公告)日:2017-06-08
申请号:US14958719
申请日:2015-12-03
Applicant: Nvidia Corporation
Inventor: M. Wasiur Rashid , Gary Ward , Wei-Je Robert Huang , Philip Browning Johnson
CPC classification number: G06F9/4843 , G06F9/522 , G06F13/12
Abstract: A copy subsystem within a processor includes a set of logical copy engines and a set of physical copy engines. Each logical copy engine corresponds to a different command stream implemented by a device driver, and each logical copy engine is configured to receive copy commands via the corresponding command stream. When a logical copy engine receives a copy command, the logical copy engine distributes the command, or one or more subcommands derived from the command, to one or more of the physical copy engines. The physical copy engines can perform multiple copy operations in parallel with one another, thereby allowing the bandwidth of the communication link(s) to be saturated.
-
公开(公告)号:US20170161099A1
公开(公告)日:2017-06-08
申请号:US14958714
申请日:2015-12-03
Applicant: Nvidia Corporation
Inventor: M. Wasiur Rashid , Gary Ward , Wei-Je Robert Huang , Philip Browning Johnson
IPC: G06F9/48
CPC classification number: G06F13/12
Abstract: A copy subsystem within a processor includes a set of logical copy engines and a set of physical copy engines. Each logical copy engine corresponds to a different command stream implemented by a device driver, and each logical copy engine is configured to receive copy commands via the corresponding command stream. When a logical copy engine receives a copy command, the logical copy engine distributes the command, or one or more subcommands derived from the command, to one or more of the physical copy engines. The physical copy engines can perform multiple copy operations in parallel with one another, thereby allowing the bandwidth of the communication link(s) to be saturated.
-
-
-
-
-
-