摘要:
A processing element is implemented in a stage of a pipeline and configured to execute an instruction. A first array of multiplexers is to provide information associated with the instruction to the processing element in response to the instruction being in a first set of instructions. A second array of multiplexers is to provide information associated with the instruction to the first processing element in response to the instruction being in a second set of instructions. A control unit is to gate at least one of power or a clock signal provided to the first array of multiplexers in response to the instruction being in the second set.
摘要:
A processing element is implemented in a stage of a pipeline and configured to execute an instruction. A first array of multiplexers is to provide information associated with the instruction to the processing element in response to the instruction being in a first set of instructions. A second array of multiplexers is to provide information associated with the instruction to the first processing element in response to the instruction being in a second set of instructions. A control unit is to gate at least one of power or a clock signal provided to the first array of multiplexers in response to the instruction being in the second set.
摘要:
Method and apparatus suited for use with a decoder receiving data serially from a network is disclosed providing synchronization throughout reception and decoding of packets of symbols. An appropriately-delayed read pointer initialization strobe used by an elastic buffer portion of the receiver provides the sequence of synchronization signals which avoids deletion of bits of packet preamble.
摘要:
In accordance with the invention, the reference portion of a primitive current switch used in emitter coupled logic or current mode logic is modified by introducing a slow device as the reference element in order to enhance the speed of turn on and turn off of the input elements. In particular, the reference transistor of a conventional ECL inverter gate or conventional CML inverter gate is replaced with a slow transistor or slow diode in order to bypass the emitter dynamic resistance. The emitter time constant of the reference element Q.sub.R is thereby increased so that the voltage on the common current source node (node 3) does not change substantially when the base of the input elements change transiently. As a consequence, the collector output of the input element, such as transistor Q.sub.A is switched on or off significantly faster.
摘要:
Methods and apparatus offload tiered memories management. The method includes obtaining a pointer to a stored memory management structure associated with tiered memories, where the memory management structure includes a plurality of memory management entries and each memory management entry of the plurality of memory management entries includes information for a memory section in one of the tiered memories. In some instances, the method includes scanning at least a part of the plurality of memory management entries. In certain instances, the method includes generating a memory profile list, where the memory profile list includes a plurality of profile entries and each profile entry of the plurality of profile entries corresponding to a scanned memory management entry in the memory management structure.
摘要:
Systems, apparatuses, and methods for implementing a multi-tiered approach to cache compression are disclosed. A cache includes a cache controller, light compressor, and heavy compressor. The decision on which compressor to use for compressing cache lines is made based on certain resource availability such as cache capacity or memory bandwidth. This allows the cache to opportunistically use complex algorithms for compression while limiting the adverse effects of high decompression latency on system performance. To address the above issue, the proposed design takes advantage of the heavy compressors for effectively reducing memory bandwidth in high bandwidth memory (HBM) interfaces as long as they do not sacrifice system performance. Accordingly, the cache combines light and heavy compressors with a decision-making unit to achieve reduced off-chip memory traffic without sacrificing system performance.
摘要:
An apparatus includes a processor, a sleep state duration prediction module, and a system management unit. The sleep state duration prediction module is configured to predict a sleep state duration for component of the processing device. The system management unit is to transition the component into a sleep state selected from a plurality of sleep states based on a comparison of the predicted sleep state duration to at least one duration threshold. Each sleep state of the plurality of sleep states is a lower power state than a previous sleep state of the plurality of sleep states.
摘要:
A method and system for operating in a single display mode operation and a dual pipe mode of operation is disclosed. The method and system includes operating in a dual pipe mode of operation in which each display pipe transmits data from a respective buffer to an associated display. The method and system further includes operating in a single display mode of operation in which one display pipe transmits data from a plurality of buffers to an associated display.
摘要:
Systems and methods for building applications by automatically incorporating application performance data into the application build process are disclosed. By capturing build settings and performance data from prior applications being executed on different computing systems such as bare metal and virtualized cloud instances, a performance database may be maintained and used to predict build settings that improve application performance (e.g., on a specific computing system or computing system configuration).
摘要:
Systems and methods related to controlling clock signals for clocking shader engines modules (SEs) and non-shader-engine modules (nSEs) of a graphics processing unit (GPU) are provided. One or more dividers receive a clock signal CLK and output a clock signal CLKA to the SEs and output a clock signal CLKB to the nSEs. The frequencies of CLKA and CLKB are independently selected based on sets of performance counter data monitored at the SEs and nSEs, respectively. The clock signal frequency for either the SEs or the nSEs is reduced when the corresponding sets of performance counter data indicates a comparatively lower processing workload for the SEs or for the nSEs.