-
公开(公告)号:US20210089466A1
公开(公告)日:2021-03-25
申请号:US16986169
申请日:2020-08-05
Applicant: INTEL CORPORATION
Inventor: Vedvyas SHANBHOGUE , Ravi SAHITA , Rajesh SANKARAN , Siddhartha CHHABRA , Abhishek BASAK , Krystof ZMUDZINSKI , Rupin VAKHARWALA
Abstract: Examples include an apparatus which accesses secure pages in a trust domain using secure lookups in first and second sets of page tables. For example, one embodiment of the processor comprises: a decoder to decode a plurality of instructions including instructions related to a trusted domain; execution circuitry to execute a first one or more of the instructions to establish a first trusted domain using a first trusted domain key, the trusted domain key to be used to encrypt memory pages within the first trusted domain; and the execution circuitry to execute a second one or more of the instructions to associate a first process address space identifier (PASID) with the first trusted domain, the first PASID to uniquely identify a first execution context associated with the first trusted domain.
-
公开(公告)号:US20230070579A1
公开(公告)日:2023-03-09
申请号:US17878427
申请日:2022-08-01
Applicant: Intel Corporation
Inventor: Elmoustapha OULD-AHMED-VALL , William RASH , Subramaniam MAIYURAN , Varghese GEORGE , Rajesh SANKARAN
IPC: G06F9/30
Abstract: Disclosed embodiments relate to systems and methods to skip inconsequential matrix operations. In one example, a processor includes decode circuitry to decode an instruction having fields to specify an opcode and locations of first source, second source, and destination matrices, the opcode indicating that the processor is to multiply each element at row M and column K of the first source matrix with a corresponding element at row K and column N of the second source matrix, and accumulate a resulting product with previous contents of a corresponding element at row M and column N of the destination matrix, the processor to skip multiplications that, based on detected values of corresponding multiplicands, would generate inconsequential results; scheduling circuitry to schedule execution of the instruction; and execution circuitry to execute the instructions as per the opcode.
-
公开(公告)号:US20200012530A1
公开(公告)日:2020-01-09
申请号:US16351396
申请日:2019-03-12
Applicant: Intel Corporation
Inventor: Utkarsh Y. KAKAIYA , Rajesh SANKARAN , Sanjay KUMAR , Kun TIAN , Philip LANTZ
Abstract: Techniques for scalable virtualization of an Input/Output (I/O) device are described. An electronic device composes a virtual device comprising one or more assignable interface (AI) instances of a plurality of AI instances of a hosting function exposed by the I/O device. The electronic device emulates device resources of the I/O device via the virtual device. The electronic device intercepts a request from the guest pertaining to the virtual device, and determines whether the request from the guest is a fast-path operation to be passed directly to one of the one or more AI instances of the I/O device or a slow-path operation that is to be at least partially serviced via software executed by the electronic device. For a slow-path operation, the electronic device services the request at least partially via the software executed by the electronic device.
-
公开(公告)号:US20240061700A1
公开(公告)日:2024-02-22
申请号:US18239489
申请日:2023-08-29
Applicant: INTEL CORPORATION
Inventor: Rajesh SANKARAN , Bret TOLL , William RASH , Subramaniam MAIYURAN , Gang CHEN , Varghese GEORGE
IPC: G06F9/455 , G06F12/1009 , G06T1/20
CPC classification number: G06F9/45558 , G06F12/1009 , G06T1/20 , G06F2009/4557 , G06F2009/45583 , G06F2009/45591
Abstract: Graphics processing systems and methods are described. A graphics processing apparatus may comprise one or more graphics processing engines, a memory, a memory management unit (MMU) including a GPU second level page table and GPU dirty bit tracking, and a provisioning agent to receive a request from a virtual machine monitor (VMM) to provision a subcluster of graphics processing apparatuses, the subcluster including a plurality of graphics processing engines from a plurality of graphics processing apparatuses connected using a scale-up fabric, provision the scale-up fabric to route data within the subcluster of graphics processing apparatuses, and provision a plurality of resources on the graphics processing apparatus for the subcluster based on the request from the VMM.
-
公开(公告)号:US20230251912A1
公开(公告)日:2023-08-10
申请号:US18301733
申请日:2023-04-17
Applicant: Intel Corporation
Inventor: Utkarsh Y. KAKAIYA , Rajesh SANKARAN , Sanjay KUMAR , Kun TIAN , Philip LANTZ
IPC: G06F9/50 , G06F15/76 , H04L51/226
CPC classification number: G06F9/5077 , G06F9/5038 , G06F15/76 , H04L51/226 , H04T2001/2093 , G06F15/17
Abstract: Techniques for scalable virtualization of an Input/Output (I/O) device are described. An electronic device composes a virtual device comprising one or more assignable interface (AI) instances of a plurality of AI instances of a hosting function exposed by the I/O device. The electronic device emulates device resources of the I/O device via the virtual device. The electronic device intercepts a request from the guest pertaining to the virtual device, and determines whether the request from the guest is a fast-path operation to be passed directly to one of the one or more AI instances of the I/O device or a slow-path operation that is to be at least partially serviced via software executed by the electronic device. For a slow-path operation, the electronic device services the request at least partially via the software executed by the electronic device.
-
6.
公开(公告)号:US20230040226A1
公开(公告)日:2023-02-09
申请号:US17559612
申请日:2021-12-22
Applicant: INTEL CORPORATION
Inventor: Saurabh GAYEN , Dhananjay JOSHI , Philip LANTZ , Rajesh SANKARAN , Narayan RANGANATHAN
Abstract: Apparatus and method for managing pipeline depth of a data processing device. For example, one embodiment of an apparatus comprises: an interface to receive a plurality of work requests from a plurality of clients; and a plurality of engines to perform the plurality of work requests; wherein the work requests are to be dispatched to the plurality of engines from a plurality of work queues, the work queues to store a work descriptor per work request, each work descriptor to include information needed to perform a corresponding work request, wherein the plurality of work queues include a first work queue to store work descriptors associated with first latency characteristics and a second work queue to store work descriptors associated with second latency characteristics; engine configuration circuitry to configure a first engine to have a first pipeline depth based on the first latency characteristics and to configure a second engine to have a second pipeline depth based on the second latency characteristics.
-
公开(公告)号:US20210089316A1
公开(公告)日:2021-03-25
申请号:US16582433
申请日:2019-09-25
Applicant: Intel Corporation
Inventor: William RASH , Subramaniam MAIYURAN , Varghese GEORGE , Bret L. TOLL , Rajesh SANKARAN , Robert S. CHAPPELL , Supratim PAL , Alexander F. HEINECKE , Elmoustapha OULD-AHMED-VALL , Gang CHEN
Abstract: Disclosed embodiments relate to deep learning implementations using systolic arrays and fused operations. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction having fields to specify an opcode and locations of a destination and N source matrices, the opcode indicating the processor is to load the N source matrices from memory, perform N convolutions on the N source matrices to generate N feature maps, and store results of the N convolutions in registers to be passed to an activation layer, wherein the processor is to perform the N convolutions and the activation layer with at most one memory load of each of the N source matrices. The processor further includes scheduling circuitry to schedule execution of the instruction and execution circuitry to execute the instruction as per the opcode.
-
公开(公告)号:US20190121658A1
公开(公告)日:2019-04-25
申请号:US16226367
申请日:2018-12-19
Applicant: Intel Corporation
Inventor: Arumugam THIYAGARAJAH , Rajesh SANKARAN , Dharmendra THAKKAR
Abstract: A processor includes a processor core, a processor cache to store reporting data structures including a queue structure, and an interrupt posting circuit coupled to the processor core and the processing cache. The interrupt posting circuit receives an interrupt request directed to a virtual processor (VP) of a virtual machine (VM) executed by the processor core. The VM is managed by a virtual machine monitor (VMM) executed by the processor core. The interrupt posting circuit determines the VP is in an inactive state and records the interrupt request in a first posted data structure allocated by the VMM for the VP in main memory coupled to the processor. The interrupt posting circuit updates location information stored in the reporting data structures based on recording the interrupt request in the first posted data structure to generate updated location information that identifies a location of the interrupt request.
-
公开(公告)号:US20240192981A1
公开(公告)日:2024-06-13
申请号:US18285212
申请日:2021-06-25
Applicant: Intel Corporation
Inventor: Wei WANG , Kun TIAN , Gilbert NEIGER , Rajesh SANKARAN , Asit MALLICK , Jr-Shian TSAI , Jacob Jun PAN , Mesut ERGIN
CPC classification number: G06F9/45558 , G06F9/30145 , G06F2009/45579
Abstract: Embodiments of exitless guest to host (G2H) notification are described. In some embodiments, G2H is provided via an instruction. An exemplary processor includes decoder circuitry to decode a single instruction, the single instruction to include a field for an opcode; and execution processing resources to execute the decoded single instruction according to the at least the opcode to cause an exitless guest to host notification from a virtual processor to a physical or virtual processor.
-
10.
公开(公告)号:US20230042934A1
公开(公告)日:2023-02-09
申请号:US17560170
申请日:2021-12-22
Applicant: Intel Corporation
Inventor: Utkarsh Y. KAKAIYA , Philip LANTZ , Sanjay KUMAR , Rajesh SANKARAN , Narayan RANGANATHAN , Saurabh GAYEN , Dhananjay JOSHI , Nikhil P. RAO
IPC: G06F11/07
Abstract: Apparatus and method for high-performance page fault handling. For example, one embodiment of an apparatus comprises: one or more accelerator engines to process work descriptors submitted by clients to a plurality of work queues; fault processing hardware logic associated with the one or more accelerator engines, the fault processing hardware logic to implement a specified page fault handling mode for each work queue of the plurality of work queues, the page fault handling modes including a first page fault handling mode and a second page fault handling mode.
-
-
-
-
-
-
-
-
-