摘要:
An apparatus has processing circuitry (18), and memory access circuitry (35) to control access to a memory system based on memory attribute data identifying each memory region as one of a plurality of region types. A speculation-restricted region type is supported, for which: at least when a first read request is non-speculatively issued to a region of the speculation-restricted type, a subsequent read request is permitted to be serviced using data obtained in response to the first read request; and for a speculatively issued read request to the region of the speculation-restricted type, at least when caching the read data would require allocation of a new entry in the cache, at least one response action, which is permitted for non-speculatively issued read requests specifying a target memory region of the speculation-restricted region type, may be prohibited from being performed before the first read request has been resolved as correct.
摘要:
A data processing system includes first and second processing nodes and response logic coupled by an interconnect fabric. A first coherence participant in the first processing node is configured to issue a memory access request specifying a target memory block, and a second coherence participant in the second processing node is configured to issue a probe request regarding a memory region tracked in a memory coherence directory. The first coherence participant is configured to, responsive to receiving the probe request after the memory access request and before receiving a systemwide coherence response for the memory access request, detect an address collision between the probe request and the memory access request and, responsive thereto, transmit a speculative coherence response. The response logic is configured to, responsive to the speculative coherence response, provide a systemwide coherence response for the probe request that prevents the probe request from succeeding.
摘要:
Systems and methods for pre-fetching address translations in a memory management unit (MMU) are disclosed. The MMU detects a triggering condition related to one or more translation caches associated with the MMU, the triggering condition associated with a trigger address, generates a sequence descriptor describing a sequence of address translations to pre-fetch into the one or more translation caches, the sequence of address translations comprising a plurality of address translations corresponding to a plurality of address ranges adjacent to an address range containing the trigger address, and issues an address translation request to the one or more translation caches for each of the plurality of address translations, wherein the one or more translation caches pre-fetch at least one address translation of the plurality of address translations into the one or more translation caches when the at least one address translation is not present in the one or more translation caches.
摘要:
A cache apparatus is provided comprising a data storage structure providing N cache ways that each store data as a plurality of cache blocks. The data storage structure is organised as a plurality of sets, where each set comprises a cache block from each way, and further the data storage structure comprises a first data array and a second data array, where at least the second data array is set associative. A set associative tag storage structure stores a tag value for each cache block, with that set associative tag storage structure being shared by the first and second data arrays. Control circuitry applies an access likelihood policy to determine, for each set, a subset of the cache blocks of that set to be stored within the first data array. Access circuitry is then responsive to an access request to perform a lookup operation within an identified set of the set associative tag storage structure overlapped with an access operation to access within the first data array the subset of the cache blocks for the identified set. In the event of a hit condition being detected that identifies a cache block present in the first data array, that access request is then processed using the cache block accessed within the first data array. If instead a hit condition is detected that identifies a cache block absent in the first data array, then a further access operation is performed to access the identified cache block within a selected way of the second data array. Such a cache structure provides a high performance and energy efficient mechanism for storing cached data.
摘要:
A distributed memory system including a plurality of chips, a plurality of nodes that are distributed across the plurality of chips such that each node is comprised within a chip, each node includes a dedicated local memory and a processor core, and each local memory is configured to be accessible over network communication, a network interface for each node, the network interface configured such that a corresponding network interface of each node is integrated in a coherence domain of the chip of the corresponding node, wherein each of the network interfaces are configured to support a one-sided operation, the network interface directly reading or writing in the dedicated local memory of the corresponding node without involving a processor core, and wherein the one-sided operation is configured such that the processor core of a corresponding node uses a protocol to directly inject a remote memory access for read or write request to the network interface of the node, the remote memory access request allowing to read or write an arbitrarily long region of a memory of a remote node,
摘要:
In one embodiment, a method of false sharing detection includes performing, by a device, a plurality of optimization passes on source code, to produce optimized source code and receiving, by the device, selection criteria, The method also includes adding instrumentation to the optimized source code, by the device, after performing the plurality of optimization passes, to produce an instrumented code, where the instrumentation is configured to track memory access addresses and access types of global variables and heap variables in accordance with the selection criteria.
摘要:
Disclosed are various embodiments for client-side predictive caching of content to facilitate instantaneous use of the content. If a user is likely to commence use of a content item through a client, the client is configured to predictively cache the content item before the user commences use. In doing so, the client may obtain metadata for the content item and an initial portion of the content item from another computing device. The client may then initialize various resources to facilitate instantaneous use of the content item by the client based at least in part on the metadata and the initial portion. The client-side cache may be divided into multiple segments with different content selection criteria.
摘要:
An apparatus for processing coherency transactions in a computing system is disclosed. The apparatus may include a request queue circuit, a duplicate tag circuit, and a memory interface unit. The request queue circuit may be configured to generate a speculative read request dependent upon a received read transaction. The duplicate tag circuit may be configured to store copies of tag from one or more cache memories, and to generate a kill message in response to a determination that data requested in the received read transaction is stored in a cache memory. The memory interface unit may be configured to store the generated speculative read request dependent upon a stall condition. The stored speculative read request may be sent to a memory controller dependent upon the stall condition. The memory interface unit may be further configured to delete the speculative read request in response to the kill message.
摘要:
Throttling instruction execution in a transaction operating in a processor configured to execute memory instructions out-of-order in a pipelined processor, wherein memory instructions are instructions for accessing operands in memory is provided. Included is executing, by the processor, instructions of a transaction comprising determining whether the transaction is in throttling mode and based on the transaction being in throttling mode, executing memory instructions in-program-order. Also included is based on the transaction not-being in throttling mode, executing memory instructions out-of-program order.
摘要:
Embodiments of the invention are generally directed to systems, methods, and apparatuses for linear to physical address translation with support for page attributes. In some embodiments, a system receives an instruction to translate a memory pointer to a physical memory address for a memory location. The system may return the physical memory address and one or more page attributes. Other embodiments are described and claimed.