Abstract:
In one embodiment, the present invention includes a method for directly communicating between an accelerator and an instruction sequencer coupled thereto, where the accelerator is a heterogeneous resource with respect to the instruction sequencer. An interface may be used to provide the communication between these resources. Via such a communication mechanism a user-level application may directly communicate with the accelerator without operating system support. Further, the instruction sequencer and the accelerator may perform operations in parallel. Other embodiments are described and claimed.
Abstract:
Embodiments of the present disclosure are directed toward techniques and configurations enhancing the performance of hardware (HW) accelerators. Disclosed embodiments include static MAC scaling arrangement, which includes architectures and techniques for scaling the performance per unit of power and performance per area of HW accelerators. Disclosed embodiments also include dynamic MAC scaling arrangement, which includes architectures and techniques for dynamically scaling the number of active multiply-and-accumulate (MAC) within an HW accelerator based on activation and weight sparsity. Other embodiments may be described and/or claimed.
Abstract:
Systems, apparatuses and methods may provide for replacing floating point matrix multiplication operations with an approximation algorithm or computation in applications that involve sparse codes and neural networks. The system may replace floating point matrix multiplication operations in sparse code applications and neural network applications with an approximation computation that applies an equivalent number of addition and/or subtraction operations.
Abstract:
Systems, apparatuses and methods may provide for replacing floating point matrix multiplication operations with an approximation algorithm or computation in applications that involve sparse codes and neural networks. The system may replace floating point matrix multiplication operations in sparse code applications and neural network applications with an approximation computation that applies an equivalent number of addition and/or subtraction operations.
Abstract:
In an embodiment, a method is provided. The method includes managing user-level threads on a first instruction sequencer in response to executing user-level instructions on a second instruction sequencer that is under control of an application level program. A first user-level thread is run on the second instruction sequencer and contains one or more user level instructions. A first user level instruction has at least 1) a field that makes reference to one or more instruction sequencers or 2) implicitly references with a pointer to code that specifically addresses one or more instruction sequencers when the code is executed.
Abstract:
Embodiments of the invention provide a method of creating, based on an operating-system-scheduled thread running on an operating-system-visible sequencer and using an instruction set extension, a persistent user-level thread to run on an operating-system-sequestered sequencer independently of context switch activities on the operating-system-scheduled thread. The operating-system-scheduled thread and the persistent user-level thread may share a common virtual address space. Embodiments of the invention may also provide a method of causing a service thread running on an additional operating-system-visible sequencer to provide operating system services to the persistent user-level thread. Embodiments of the invention may further provide apparatus, system, and machine-readable medium thereof.
Abstract:
Embodiments of the invention provide a method of creating, based on an operating-system-scheduled thread running on an operating-system-visible sequencer and using an instruction set extension, a persistent user-level thread to run on an operating-system-sequestered sequencer independently of context switch activities on the operating-system-scheduled thread. The operating-system-scheduled thread and the persistent user-level thread may share a common virtual address space. Embodiments of the invention may also provide a method of causing a service thread running on an additional operating-system-visible sequencer to provide operating system services to the persistent user-level thread. Embodiments of the invention may further provide apparatus, system, and machine-readable medium thereof.
Abstract:
Embodiments of the invention provide a method of creating, based on an operating-system-scheduled thread running on an operating-system-visible sequencer and using an instruction set extension, a persistent user-level thread to run on an operating-system-sequestered sequencer independently of context switch activities on the operating-system-scheduled thread. The operating-system-scheduled thread and the persistent user-level thread may share a common virtual address space. Embodiments of the invention may also provide a method of causing a service thread running on an additional operating-system-visible sequencer to provide operating system services to the persistent user-level thread. Embodiments of the invention may further provide apparatus, system, and machine-readable medium thereof.
Abstract:
In one embodiment, the present invention includes a method for directly communicating between an accelerator and an instruction sequencer coupled thereto, where the accelerator is a heterogeneous resource with respect to the instruction sequencer. An interface may be used to provide the communication between these resources. Via such a communication mechanism a user-level application may directly communicate with the accelerator without operating system support. Further, the instruction sequencer and the accelerator may perform operations in parallel. Other embodiments are described and claimed.