摘要:
Methods and apparatus to provide virtualized vector processing are described. In one embodiment, one or more operations corresponding to a virtual vector request are distributed to one or more processor cores for execution.
摘要:
Methods and apparatus to provide virtualized vector processing are disclosed. In one embodiment, a processor includes a decode unit to decode a first instruction into a decoded first instruction and a second instruction into a decoded second instruction, and an execution unit to: execute the decoded first instruction to cause allocation of a first portion of one or more operations corresponding to a virtual vector request to a first processor core, and generation of a first signal corresponding to a second portion of the one or more operations to cause allocation of the second portion to a second processor core, and execute the decoded second instruction to cause a first computational result corresponding to the first portion of the one or more operations and a second computational result corresponding to the second portion of the one or more operations to be aggregated and stored to a memory location.
摘要:
In one embodiment, the present invention includes a method for performing a first level task of an application in a first processor of a system and dynamically allocating a second level task of the application to one of the first processor and a second processor based on architectural feedback information. In this manner, improved scheduling and application performance can be achieved by better utilizing system resources. Other embodiments are described and claimed.
摘要:
A technique to perform concurrent updates to a shared data structure. At least one embodiment of the invention concurrently stores copies of a data structure within a plurality of local caches, updates the local caches with a partial result of a computation distributed among a plurality of processing elements, and returns the partial results to combining logic in parallel, which combines the partial results into a final result.
摘要:
Some embodiments of the invention provide devices, systems and methods of cache coherence. For example, an apparatus in accordance with an embodiment of the invention includes a memory to store a memory line; and a cache controller logic to assign a first cache coherence state to the memory line in relation to a first component, and to assign a second, different, cache coherence state to the memory line in relation to a second, different, component.
摘要:
In one embodiment, a processor may include a vector unit to perform operations on multiple data elements responsive to a single instruction, and a control unit coupled to the vector unit to provide the data elements to the vector unit, where the control unit is to enable an atomic vector operation to be performed on at least some of the data elements responsive to a first vector instruction to be executed under a first mask and a second vector instruction to be executed under a second mask. Other embodiments are described and claimed.
摘要:
A system and method are configured to detect conflicts when converting scalar processes to parallel processes (“SIMDifying”). Conflicts may be detected for an unordered single index, an ordered single index and/or ordered pairs of indices. Conflicts may be further detected for read-after-write dependencies. Conflict detection is configured to identify operations (i.e., iterations) in a sequence of iterations that may not be done in parallel.
摘要:
A system and method are configured to detect conflicts when converting scalar processes to parallel processes (“SIMDifying”). Conflicts may be detected for an unordered single index, an ordered single index and/or ordered pairs of indices. Conflicts may be further detected for read-after-write dependencies. Conflict detection is configured to identify operations (i.e., iterations) in a sequence of iterations that may not be done in parallel.
摘要:
A technique to perform concurrent updates to a shared data structure. At least one embodiment of the invention concurrently stores copies of a data structure within a plurality of local caches, updates the local caches with a partial result of a computation distributed among a plurality of processing elements, and returns the partial results to combining logic in parallel, which combines the partial results into a final result.
摘要:
In one embodiment, a processor may include a vector unit to perform operations on multiple data elements responsive to a single instruction, and a control unit coupled to the vector unit to provide the data elements to the vector unit, where the control unit is to enable an atomic vector operation to be performed on at least some of the data elements responsive to a first vector instruction to be executed under a first mask and a second vector instruction to be executed under a second mask. Other embodiments are described and claimed.