Abstract:
Embodiments are described for a communications interconnect scheme for 3D stacked memory devices. A ring network design is used for networks of memory chips organized as individual devices with multiple dies or wafers. The design comprises a three-tier ring network where each ring serves a different set of memory blocks. One ring or set of rings interconnects memory within a die (inter-bank), a second ring or set of rings interconnects memory across die in a stack (inter-die), and the third ring or set of rings interconnects memory across stacks or chip packages (inter-stack).
Abstract:
The described embodiments include a computing device with one or more entities (processor cores, processors, etc.). In some embodiments, during operation, a thermal power management unit in the computing device uses a linear prediction to compute a predicted duration of a next idle period for an entity based on the duration of one or more previous idle periods for the entity. Based on the predicted duration of the next idle period, the thermal power management unit configures the entity to operate in a corresponding idle state.
Abstract:
A level of cache memory receives modified data from a higher level of cache memory. A set of cache lines with an index associated with the modified data is identified. The modified data is stored in the set in a cache line with an eviction priority that is at least as high as an eviction priority, before the modified data is stored, of an unmodified cache line with a highest eviction priority among unmodified cache lines in the set.
Abstract:
A processing device includes a plurality of components and a system management unit to selectively schedule an application phase to one of the plurality of components based on one or more comparisons of predictions of a plurality of thermal impacts of executing the application phase on each of the plurality of components. The predictions may be generated based on a thermal history associated with the application phase, thermal sensitivities of the plurality of components, or a layout of the plurality of components in the processing device.
Abstract:
Embodiments are described for methods and systems for mapping virtual memory pages to physical memory pages by analyzing a sequence of memory-bound accesses to the virtual memory pages, determining a degree of contiguity between the accessed virtual memory pages, and mapping sets of the accessed virtual memory pages to respective single physical memory pages. Embodiments are also described for a method for increasing locality of memory accesses to DRAM in virtual memory systems by analyzing a pattern of virtual memory accesses to identify contiguity of accessed virtual memory pages, predicting contiguity of the accessed virtual memory pages based on the pattern, and mapping the identified and predicted contiguous virtual memory pages to respective single physical memory pages.
Abstract:
A system may determine that a processor has powered up. The system may determine a first prefetching policy based on determining that the processor has powered up. The system may fetch information, from a main memory and for storage by a cache associated with the processor, using the first prefetching policy. The system may determine, after fetching information using the first prefetching policy, to apply a second prefetching policy that is different than the first prefetching policy. The system may fetch information, from the main memory and for storage by the cache, using the second prefetching policy.
Abstract:
A processor system presented here has a plurality of execution cores and a plurality of stack caches, wherein each of the stack caches is associated with a different one of the execution cores. A method of managing stack data for the processor system is presented here. The method maintains a stack cache manager for the plurality of execution cores. The stack cache manager includes entries for stack data accessed by the plurality of execution cores. The method processes, for a requesting execution core of the plurality of execution cores, a virtual address for requested stack data. The method continues by accessing the stack cache manager to search for an entry of the stack cache manager that includes the virtual address for requested stack data, and using information in the entry to retrieve the requested stack data.
Abstract:
Power gating decisions can be made based on measures of cache dirtiness. Analyzer logic can selectively power gate a component of a processor system based on a cache dirtiness of one or more caches associated with the component. The analyzer logic may power gate the component when the cache dirtiness exceeds a threshold and may maintains the component in an idle state when the cache dirtiness does not exceed the threshold. Idle time prediction logic may be used to predict a duration of an idle time of the component. The analyzer logic may then selectively power gates the component based on the cache dirtiness and the predicted idle time.
Abstract:
A method and apparatus for exiting a low power state based on a prior prediction is disclosed. An integrated circuit (IC) includes a functional unit configured to, during operation, cycle between intervals of an active state and intervals of an idle state. The IC also include a power management unit configured to place the functional unit in a low power state responsive to the functional unit entering the idle state. The power management unit is further configured to preemptively cause the functional unit to exit the low power state at a predetermined time after entering the low power. The predetermined time is based on a prediction of idle state duration made prior to entering the low power state. The prediction may be generated by a prediction unit, based on a history of durations of intervals in which the functional unit was in the idle state.
Abstract:
A method and apparatus for idle phase prediction in integrated circuits is disclosed. In one embodiment, an integrated circuit (IC) includes a functional unit configured to cycle between intervals of an active state and an idle state. The IC further includes a prediction unit configured to record a history of idle state durations for a plurality of intervals of the idle state. Based on the history of idle state durations, the prediction unit is configured to generate a prediction of the duration of the next interval of the idle state. The prediction may be used by a power management unit to, among other uses, determine whether to place the functional unit in a low power (e.g., sleep) state.