摘要:
An apparatus and method are described for providing low-latency invocation of accelerators. For example, a processor according to one embodiment comprises: a command register for storing command data identifying a command to be executed; a result register to store a result of the command or data indicating a reason why the commend could not be executed; execution logic to execute a plurality of instructions including an accelerator invocation instruction to invoke one or more accelerator commands, the accelerator invocation instruction to store command data specifying the command within the command register; one or more accelerators to read the command data from the command register and responsively attempt to execute the command identified by the command data, wherein if the one or more accelerators successfully execute the command, the one or more accelerators are to store result data comprising the results of the command in the result register; and if the one or more accelerators cannot successfully execute the command, the one or more accelerators are to store result data indicating a reason why the command cannot be executed, wherein the execution logic is to temporarily halt execution until the accelerator completes execution or is interrupted, wherein the accelerator includes logic to store its state if interrupted so that it can continue execution at a later time.
摘要:
An apparatus and method are described for providing low-latency invocation of accelerators. For example, a processor according to one embodiment comprises: a command register for storing command data identifying a command to be executed; a result register to store a result of the command or data indicating a reason why the commend could not be executed; execution logic to execute a plurality of instructions including an accelerator invocation instruction to invoke one or more accelerator commands, the accelerator invocation instruction to store command data specifying the command within the command register; one or more accelerators to read the command data from the command register and responsively attempt to execute the command identified by the command data, wherein if the one or more accelerators successfully execute the command, the one or more accelerators are to store result data comprising the results of the command in the result register; and if the one or more accelerators cannot successfully execute the command, the one or more accelerators are to store result data indicating a reason why the command cannot be executed, wherein the execution logic is to temporarily halt execution until the accelerator completes execution or is interrupted, wherein the accelerator includes logic to store its state if interrupted so that it can continue execution at a later time.
摘要:
A processor is described that includes a plurality of processing cores. The processor includes an interconnection network coupled to each of said processing cores. The processor includes snoop filter logic circuitry coupled to the interconnection network and associated with coherence plane logic circuitry of the processor. The snoop filter logic circuitry contains circuitry to hold information that identifies not only which of the processing cores are caching specific cache lines that are cached by the processing cores, but also, where in respective caches of the processing cores the cache lines are cached.
摘要:
A processor is described comprising: an architectural register file implemented as a combination of a register file cache and an architectural register region within a level 1 (L1) data cache, and a data location table (DLT) to store data indicating a location of each architectural register within the register file cache and/or the architectural register region within the L1 data cache.
摘要:
A method and apparatus to reduce the idle link power in a platform. In one embodiment of the invention, the host and its coupled endpoint(s) in the platform each has a low power idle link state that allows disabling of the high speed link circuitry in both the host and its coupled endpoint(s). This allows the platform to reduce its idle power as both the host and its coupled endpoint(s) are able to turn off their high speed link circuitry in one embodiment of the invention.
摘要:
Method, apparatus, and systems employing dictionary-based high-bandwidth lossless compression. A pair of dictionaries having entries that are synchronized and encoded to support compression and decompression operations are implemented via logic at a compressor and decompressor. The compressor/decompressor logic operatives in a cooperative manner, including implementing the same dictionary update schemes, resulting in the data in the respective dictionaries being synchronized. The dictionaries are also configured with replaceable entries, and replacement policies are implemented based on matching bytes of data within sets of data being transferred over the link. Various schemes are disclosed for entry replacement, as well as a delayed dictionary update technique. The techniques support line-speed compression and decompression using parallel operations resulting in substantially no latency overhead.
摘要:
A method and apparatus for enhancing/extending a serial point-to-point interconnect architecture, such as Peripheral Component Interconnect Express (PCIe) is herein described. Temporal and locality caching hints and prefetching hints are provided to improve system wide caching and prefetching. Message codes for atomic operations to arbitrate ownership between system devices/resources are included to allow efficient access/ownership of shared data. Loose transaction ordering provided for while maintaining corresponding transaction priority to memory locations to ensure data integrity and efficient memory access. Active power sub-states and setting thereof is included to allow for more efficient power management. And, caching of device local memory in a host address space, as well as caching of system memory in a device local memory address space is provided for to improve bandwidth and latency for memory accesses.
摘要:
A method and apparatus for enhancing/extending a serial point-to-point interconnect architecture, such as Peripheral Component Interconnect Express (PCIe) is herein described. Temporal and locality caching hints and prefetching hints are provided to improve system wide caching and prefetching. Message codes for atomic operations to arbitrate ownership between system devices/resources are included to allow efficient access/ownership of shared data. Loose transaction ordering provided for while maintaining corresponding transaction priority to memory locations to ensure data integrity and efficient memory access. Active power sub-states and setting thereof is included to allow for more efficient power management. And, caching of device local memory in a host address space, as well as caching of system memory in a device local memory address space is provided for to improve bandwidth and latency for memory accesses.
摘要:
Method, apparatus, and systems employing novel delayed dictionary update schemes for dictionary-based high-bandwidth lossless compression. A pair of dictionaries having entries that are synchronized and encoded to support compression and decompression operations are implemented via logic at a compressor and decompressor. The compressor/decompressor logic operatives in a cooperative manner, including implementing the same dictionary update schemes, resulting in the data in the respective dictionaries being synchronized. The dictionaries are also configured with replaceable entries, and replacement policies are implemented based on matching bytes of data within sets of data being transferred over the link. Various schemes are disclosed for entry replacement, as well as a delayed dictionary update technique. The techniques support line-speed compression and decompression using parallel operations resulting in substantially no latency overhead.
摘要:
A method and apparatus for enhancing/extending a serial point-to-point interconnect architecture, such as Peripheral Component Interconnect Express (PCIe) is herein described. Temporal and locality caching hints and prefetching hints are provided to improve system wide caching and prefetching. Message codes for atomic operations to arbitrate ownership between system devices/resources are included to allow efficient access/ownership of shared data. Loose transaction ordering provided for while maintaining corresponding transaction priority to memory locations to ensure data integrity and efficient memory access. Active power sub-states and setting thereof is included to allow for more efficient power management. And, caching of device local memory in a host address space, as well as caching of system memory in a device local memory address space is provided for to improve bandwidth and latency for memory accesses.