Abstract:
Multiprocessor clusters in a virtualized environment conventionally fail to provide memory access security, which is frequently a requirement for efficient utilization in multi-client settings. Without adequate access security, a malicious process may access what might be confidential data that belongs to a different client sharing the multiprocessor cluster. Furthermore, an inadvertent programming error in the code for one client process may accidentally corrupt data that belongs to the different client. Neither scenario is acceptable. Embodiments of the present disclosure provide access security by enabling each processing node within a multiprocessor cluster to virtualize and manage local memory access and only process access requests possessing proper access credentials. In this way, different applications executing on a multiprocessor cluster may be isolated from each other while advantageously sharing the hardware resources of the multiprocessor cluster.
Abstract:
Apparatuses, systems, and techniques to generate a trusted execution environment including multiple accelerators. In at least one embodiment, a parallel processing unit (PPU), such as a graphics processing unit (GPU), operates in a secure execution mode including a protect memory region. Furthermore, in an embodiment, a cryptographic key is utilized to protect data during transmission between the accelerators.
Abstract:
Fabric Attached Memory (FAM) provides a pool of memory that can be accessed by one or more processors, such as a graphics processing unit(s) (GPU)(s), over a network fabric. In one instance, a technique is disclosed for using imperfect processors as memory controllers to allow memory, which is local to the imperfect processors, to be accessed by other processors as fabric attached memory. In another instance, memory address compaction is used within the fabric elements to fully utilize the available memory space.
Abstract:
A method for handling parallel processing clients associated with a server in a GPU, the method comprising: receiving a failure indication for at least client running a thread in the GPU; determining threads in the GPU associated with the failing client; exiting threads in the GPU associated with the failing client; and continuing to execute remaining threads in the GPU for other clients running threads in the GPU.
Abstract:
Apparatuses, systems, and techniques to generate a trusted execution environment including multiple accelerators. In at least one embodiment, a parallel processing unit (PPU), such as a graphics processing unit (GPU), operates in a secure execution mode including a protect memory region. Furthermore, in an embodiment, a cryptographic key is utilized to protect data during transmission between the accelerators.
Abstract:
Apparatuses, systems, and techniques to generate a trusted execution environment including multiple accelerators. In at least one embodiment, a parallel processing unit (PPU), such as a graphics processing unit (GPU), operates in a secure execution mode including a protect memory region. Furthermore, in an embodiment, a cryptographic key is utilized to protect data during transmission between the accelerators.
Abstract:
A system and method for detecting, filtering, prioritizing and reporting shared memory hazards are disclosed. The method includes, for a unit of hardware operating on a block of threads, mapping a plurality of shared memory locations assigned to the unit to a tracking table. The tracking table comprises initialization information for each shared memory location. The method also includes, for an instruction of a program within a barrier region, identifying a potential conflict by identifying a second access to a location in shared memory within a block of threads executed by the hardware unit. First information associated with a first access and second information associated with the second access to the location is determined. Filter criteria is applied to the first and second information to determine whether the instruction causes a reportable hazard. The instruction is reported when it causes the reportable hazard.
Abstract:
A system and method for detecting shared memory hazards are disclosed. The method includes, for a unit of hardware operating on a block of threads, mapping a plurality of shared memory locations assigned to the unit to a tracking table. The tracking table comprises an initialization bit as well as access type information, collectively called the state tracking bits for each shared memory location. The method also includes, for an instruction of a program within a barrier region, identifying a second access to a location in shared memory within a block of threads executed by the hardware unit. The second access is identified based on a status of the state tracking bits. The method also includes determining a hazard based on a first type of access and a second type of access to the shared memory location. Information related to the first access is provided in the table.
Abstract:
In examples, trusted execution environments (TEE) are provided for an instance of a parallel processing unit (PPU) as PPU TEEs. Different instances of a PPU correspond to different PPU TEEs, and provide accelerated confidential computing to a corresponding TEE. The processors of each PPU instance have separate and isolated paths through the memory system of the PPU which are assigned uniquely to an individual PPU instance. Data in device memory of the PPU may be isolated and access controlled amongst the PPU instances using one or more hardware firewalls. A GPU hypervisor assigns hardware resources to runtimes and performs access control and context switching for the runtimes. A PPU instance uses a cryptographic key to protect data for secure communication. Compute engines of the PPU instance are prevented from writing outside of a protected memory region. Access to a write protected region in PPU memory is blocked from other computing devices and/or device instances.
Abstract:
Fabric Attached Memory (FAM) provides a pool of memory that can be accessed by one or more processors, such as a graphics processing unit(s) (GPU)(s), over a network fabric. In one instance, a technique is disclosed for using imperfect processors as memory controllers to allow memory, which is local to the imperfect processors, to be accessed by other processors as fabric attached memory. In another instance, memory address compaction is used within the fabric elements to fully utilize the available memory space.