摘要:
In an embodiment, a processor includes multiple tiles, each including a core and a tile cache hierarchy. This tile cache hierarchy includes a first level cache, a mid-level cache (MLC) and a last level cache (LLC), and each of these caches is private to the tile. A controller coupled to the tiles includes a cache power control logic to receive utilization information regarding the core and the tile cache hierarchy of a tile and to cause the LLC of the tile to be independently power gated, based at least in part on this information. Other embodiments are described and claimed.
摘要:
In an embodiment, a processor includes multiple tiles, each including a core and a tile cache hierarchy. This tile cache hierarchy includes a first level cache, a mid-level cache (MLC) and a last level cache (LLC), and each of these caches is private to the tile. A controller coupled to the tiles includes a cache power control logic to receive utilization information regarding the core and the tile cache hierarchy of a tile and to cause the LLC of the tile to be independently power gated, based at least in part on this information. Other embodiments are described and claimed.
摘要:
Methods and apparatus for implementing active interconnect link power management using an adaptive low-power link-state entry policy. The power state of an interconnect link or fabric is changed in response to applicable conditions determined by low-power link-state entry policy logic in view of runtime traffic on the interconnect link or fabric. The low-power link-state policy logic may be configured to include consideration of operating system input and Quality of Service (QoS) requirements for applications and devices employing the link or fabric, and device latency tolerance requirements.
摘要:
Methods and apparatus for implementing active interconnect link power management using an adaptive low-power link-state entry policy. The power state of an interconnect link or fabric is changed in response to applicable conditions determined by low-power link-state entry policy logic in view of runtime traffic on the interconnect link or fabric. The low-power link-state policy logic may be configured to include consideration of operating system input and Quality of Service (QoS) requirements for applications and devices employing the link or fabric, and device latency tolerance requirements.
摘要:
Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.
摘要:
The present disclosure provides techniques for cache management. A data block may be received from an IO interface. After receiving the data block, the occupancy level of a cache memory may be determined. The data block may be directed to a main memory if the occupancy level exceeds a threshold. The data block may be directed to a cache memory if the occupancy level is below a threshold.
摘要:
A method and apparatus for selectively parking routers used for routing traffic in mesh interconnects. Various router parking (RP) algorithms are disclosed, including an aggressive RP algorithm where a minimum number of routers are kept active to ensure adequate network connectivity between active nodes and/or intercommunicating nodes, leading to a maximum reduction in static power consumption, and a conservative RP algorithm that favors network latency considerations over static power consumption while also reducing power. An adaptive RP algorithm is also disclosed that implements aspects of the aggressive and conservative RP algorithms to balance power consumption and latency considerations in response to ongoing node utilization and associated traffic. The techniques may be implemented in internal network structures, such as for single chip computers, as well as external network structures, such as computing clusters and massively parallel computer architectures. Performance modeling has demonstrated substantial power reduction may be obtained using the router parking techniques while maintaining Quality of Service performance objectives.
摘要:
A mechanism is described for facilitating dynamic and remote memory collaboration at computing devices according to one embodiment of the invention. A method of embodiments of the invention includes dynamically classifying a computing device of a plurality of computing devices as a memory server, where the plurality of computing devices are coupled to each other over a network. The method may further include offering, by the memory server, of memory to be used by one or more of the plurality of computing devices classified as one or more memory clients, and remotely granting, by the memory server, of the memory to the one or more memory clients.
摘要:
The present invention may provide a computer system including a plurality of tiles divided into multiple virtual domains. Each tile may include a router to communicate with others of said tiles, a private cache to store data, and a spill table to record pointers for data evicted from the private cache to a remote host, wherein the remote host and the respective tile are provided in the same virtual domain. The spill tables may allow for faster retrieval of previously evicted data because the home registry does not need to be referenced if requested data is listed in the spill table. Therefore, embodiments of the present invention may provide a distance-aware cache collaboration architecture without incurring extraneous overhead expenses.
摘要:
In one embodiment, the present invention includes a method for obtaining file information regarding a file to be downloaded from a remote location to a computing device, creating at least one empty file in a destination storage based on the file information and communicating block information regarding the empty file to a network interface, and receiving a data packet of the file in the network interface and directly sending a payload of the data packet from the network interface to the destination storage according to the block information, while a host processor of the computing device is in a low power state. Other embodiments are described and claimed.