-
公开(公告)号:US20240061700A1
公开(公告)日:2024-02-22
申请号:US18239489
申请日:2023-08-29
Applicant: INTEL CORPORATION
Inventor: Rajesh SANKARAN , Bret TOLL , William RASH , Subramaniam MAIYURAN , Gang CHEN , Varghese GEORGE
IPC: G06F9/455 , G06F12/1009 , G06T1/20
CPC classification number: G06F9/45558 , G06F12/1009 , G06T1/20 , G06F2009/4557 , G06F2009/45583 , G06F2009/45591
Abstract: Graphics processing systems and methods are described. A graphics processing apparatus may comprise one or more graphics processing engines, a memory, a memory management unit (MMU) including a GPU second level page table and GPU dirty bit tracking, and a provisioning agent to receive a request from a virtual machine monitor (VMM) to provision a subcluster of graphics processing apparatuses, the subcluster including a plurality of graphics processing engines from a plurality of graphics processing apparatuses connected using a scale-up fabric, provision the scale-up fabric to route data within the subcluster of graphics processing apparatuses, and provision a plurality of resources on the graphics processing apparatus for the subcluster based on the request from the VMM.
-
公开(公告)号:US20210089316A1
公开(公告)日:2021-03-25
申请号:US16582433
申请日:2019-09-25
Applicant: Intel Corporation
Inventor: William RASH , Subramaniam MAIYURAN , Varghese GEORGE , Bret L. TOLL , Rajesh SANKARAN , Robert S. CHAPPELL , Supratim PAL , Alexander F. HEINECKE , Elmoustapha OULD-AHMED-VALL , Gang CHEN
Abstract: Disclosed embodiments relate to deep learning implementations using systolic arrays and fused operations. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction having fields to specify an opcode and locations of a destination and N source matrices, the opcode indicating the processor is to load the N source matrices from memory, perform N convolutions on the N source matrices to generate N feature maps, and store results of the N convolutions in registers to be passed to an activation layer, wherein the processor is to perform the N convolutions and the activation layer with at most one memory load of each of the N source matrices. The processor further includes scheduling circuitry to schedule execution of the instruction and execution circuitry to execute the instruction as per the opcode.
-