Abstract:
A method and an apparatus for modulating the prefetch training of a memory-side prefetch unit (MS-PFU) are described. An MS-PFU trains on memory access requests it receives from processors and their processor-side prefetch units (PS-PFUs). In the method and apparatus, an MS-PFU modulates its training based on one or more of a PS-PFU memory access request, a PS-PFU memory access request type, memory utilization, or the accuracy of MS-PFU prefetch requests.
Abstract:
A data processing device is disclosed that includes multiple processing cores, where each core is associated with a corresponding cache. When a processing core is placed into a first sleep mode, the data processing device initiates a first phase. If any cache probes are received at the processing core during the first phase, the cache probes are serviced. At the end of the first phase, the cache corresponding to the processing core is flushed, and subsequent cache probes are not serviced at the cache. Because it does not service the subsequent cache probes, the processing core can therefore enter another sleep mode, allowing the data processing device to conserve additional power.
Abstract:
A system and method for managing a cache subsystem. A system comprises a plurality of processing entities, a cache shared by the plurality of processing entities, and circuitry configured to manage allocations of data into the cache. Cache controller circuitry is configured to allocate data in the cache at a less favorable position in the replacement stack in response to determining a processing entity which corresponds to the allocated data has relatively poor cache behavior compared to other processing entities. The circuitry is configured to track a relative hit rate for each processing entity, such as a thread or processor core. A figure of merit may be determined for each processing entity which reflects how well a corresponding processing entity is behaving with respect to the cache. Processing entities which have a relatively low figure of merit may have their data allocated in the shared cache at a lower level in the cache replacement stack.
Abstract:
A method and apparatus for accelerated shared data migration between cores. Using an Always Migrate protocol, when a migratory probe hits a directory entry in either modified or owned state, the entry is transitioned to an owned state, and a source done command is sent without sending cache block ownership or state information to the directory.
Abstract:
A method for determining event based energy weights for digital power estimation includes obtaining a reference energy value corresponding to a power consumed by at least a portion of an integrated circuit (IC) device during operation. The method includes determining and selecting a subset of signals from a set of all signals within the IC that correlates to energy use within the IC. The method includes determining an activity factor of each signal in the subset by monitoring each signal while simulating execution of a particular set of instructions. The method includes determining a weight factor or at least an approximation of a weight factor for each signal in the subset by solving within a predetermined accuracy, a multivariable equation in which the reference energy value equals a weighted sum of the activity of the signals of the selected subset multiplied by their respective weight factors.
Abstract:
A multi-chip module configuration includes two processors, each having two nodes, each node including multiple cores or compute units. Each node is connected to the other nodes by links that are high bandwidth or low bandwidth. Routing of traffic between the nodes is controlled at each node according to a routing table and/or a control register that optimize bandwidth usage and traffic congestion control.
Abstract:
A method and apparatus for utilizing a higher-level cache as a neighbor cache directory in a multi-processor system are provided. In the method and apparatus, when the data field of a portion or all of the cache is unused, a remaining portion of the cache is repurposed for usage as neighbor cache directory. The neighbor cache provides a pointer to another cache in the multi-processor system storing memory data. The neighbor cache directory can be searched in the same manner as a data cache.
Abstract:
A method for determining event based energy weights for digital power estimation includes obtaining a reference energy value corresponding to a power consumed by at least a portion of an integrated circuit (IC) device during operation. The method includes determining and selecting a subset of signals from a set of all signals within the IC that correlates to energy use within the IC. The method includes determining an activity factor of each signal in the subset by monitoring each signal while simulating execution of a particular set of instructions. The method includes determining a weight factor or at least an approximation of a weight factor for each signal in the subset by solving within a predetermined accuracy, a multivariable equation in which the reference energy value equals a weighted sum of the activity of the signals of the selected subset multiplied by their respective weight factors.
Abstract:
A data processing device is disclosed that includes multiple processing cores, where each core is associated with a corresponding cache. When a processing core is placed into a first sleep mode, the data processing device initiates a first phase. If any cache probes are received at the processing core during the first phase, the cache probes are serviced. At the end of the first phase, the cache corresponding to the processing core is flushed, and subsequent cache probes are not serviced at the cache. Because it does not service the subsequent cache probes, the processing core can therefore enter another sleep mode, allowing the data processing device to conserve additional power.
Abstract:
A system and method for managing a memory system. A system includes a plurality of processing entities and a cache which is shared by the processing entities. Responsive to a replacement event, circuitry may identify data entries of the shared cache which are candidates for replacement. Data entries which have been identified as candidates for replacement may be removed as candidates for replacement in response to detecting the data entry corresponds to data which is shared by at least two of the plurality of processing entities. The circuitry may maintain an indication as to which of the processing entities caused an initial allocation of data into the shared cache. When the circuitry detects that a particular data item is accessed by a processing entity other than a processing entity which caused an allocation of the given data item, the data item may be deemed classified as shared data.