摘要:
The present invention may provide a computer system including a plurality of tiles divided into multiple virtual domains. Each tile may include a router to communicate with others of said tiles, a private cache to store data, and a spill table to record pointers for data evicted from the private cache to a remote host, wherein the remote host and the respective tile are provided in the same virtual domain. The spill tables may allow for faster retrieval of previously evicted data because the home registry does not need to be referenced if requested data is listed in the spill table. Therefore, embodiments of the present invention may provide a distance-aware cache collaboration architecture without incurring extraneous overhead expenses.
摘要:
An embodiment may include circuitry to determine whether to issue at least one credit to at least one sender of at least one packet. The credit(s) may be to grant permission to the at least one sender to issue the at least one packet to at least one receiver of the at least one packet. The determination of whether to issue the credit(s) may be based, at least in part, upon whether a time in which the at least one receiver is in a relatively lower power state prior to issuance of the credit(s) is at least sufficient to provide at least a predetermined amount of reduction in power consumption. The relatively lower power state may be relative to a relatively higher power state of the at least one receiver that prevails at the issuance of the credit(s). Additionally or alternatively, the circuitry may be to receive such credit(s).
摘要:
Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.
摘要:
Methods and apparatus to support multiple-writer/multiple-reader concurrency for software flow/packet classification on general purpose multi-core systems. A flow table with rows mapped to respective hash buckets with multiple entry slots is implemented in memory of a host platform with multiple cores, with each bucket being associated with a version counter. Multiple writer and reader threads are run on the cores, with writers providing updates to the flow table data. In connection with inserting new key data, a determination is made to which buckets will be changed, and access rights to those buckets are acquired prior to making any changes. For example, under a flow table employing cuckoo hashing, access rights are acquired to buckets along a full cuckoo path. Once the access rights are obtained, a writer is enabled to update data in the applicable buckets to effect entry of the new key data, while other writer threads are prevented from changing any of these buckets, but may concurrently insert or modify key data in other buckets.
摘要:
Methods and apparatus for facilitating efficient Quality of Service (QoS) support for software-based packet processing by offloading QoS rate-limiting to NIC hardware. Software-based packet processing is performed on packet flows received at a compute platform, such as a general purpose server, and/or packet flows generated by local applications running on the compute platform. The packet processing includes packet classification that associates packets with packet flows using flow IDs, and identifying a QoS class for the packet and packet flow. NIC Tx queues are dynamically configured or pre-configured to effect rate limiting for forwarding packets enqueued in the NIC Tx queues. New packet flows are detected, and mapping data is created to map flow IDs associated with flows to the NIC Tx queues used to forward the packets associated with the flows.
摘要:
Technologies for modular forwarding table scalability of a software cluster switch includes a plurality of computing nodes. Each of the plurality of computing nodes includes a global partition table (GPT) to determine an egress computing node for a network packet received at an ingress computing node of the software cluster switch based on a flow identifier of the network packet. The GPT includes a set mapping index that corresponds to a result of a hash function applied to the flow identifier and a hash function index that identifies a hash function of a hash function family whose output results in a node identifier that corresponds to the egress computing node to which the ingress computing node forwards the network packet. Other embodiments are described herein and claimed.
摘要:
The present invention may provide a computer system including a plurality of tiles divided into multiple virtual domains. Each tile may include a router to communicate with others of said tiles, a private cache to store data, and a spill table to record pointers for data evicted from the private cache to a remote host, wherein the remote host and the respective tile are provided in the same virtual domain. The spill tables may allow for faster retrieval of previously evicted data because the home registry does not need to be referenced if requested data is listed in the spill table. Therefore, embodiments of the present invention may provide a distance-aware cache collaboration architecture without incurring extraneous overhead expenses.
摘要:
A computer system may comprise a platform in which a processing block may be provisioned. The processing block may determine an optimal compression ratio such that the optimal compression ratio may cause a minimum of a total power to be consumed by the computer platform. The total power may comprise total compression power consumption and total transmission power consumption. The processing block may generate compressed frames from a plurality of frames generated by an application. The compressed frames may be generated by encoding the plurality of frames using the optimal compression ratio. The processing block may select a network interface from multiple network interfaces supported by the computer system to transmit the compressed frames.
摘要:
Methods, apparatus and systems for improved performance and energy efficiency of software-based routers. A software router running on a host computer system employing multiple Network Interface Controllers (NICs) maintains a routing table wherein packet flows are classified as managed flows (MFs) under which packets are received at and forwarded from the same NIC and unmanaged flows UFs under which packets are received at and forwarded from different NICs. Forwarding table data is employed by a NIC to facilitate packet identification and flow classification operations under which the NIC determines whether a received packet is an MF, UF, or an unclassified flow. Under various schemes, packet forwarding for MFs is handled by the software router architecture such that either only the packet header is copied into memory in the host or the entire packet forwarding is handled by the NIC.
摘要:
Systems and methods of managing break events may provide for detecting a first break event from a first event source and detecting a second break event from a second event source. In one example, the event sources can include devices coupled to a platform as well as active applications on the platform. Issuance of the first and second break events to the platform can be coordinated based on at least in part runtime information associated with the platform.