摘要:
A synchronization method is executed by a multi-core processor system. The synchronization method includes registering based on a synchronous command issued from a first CPU, CPUs to be synchronized and a count of the CPUs into a specific table; counting by each of the CPUs and based on a synchronous signal from the first CPU, an arrival count for a synchronous point, and creating by each of the CPUs, a second shared memory area that is a duplication of a first shared memory area accessed by processes executed by the CPUs; and comparing the first shared memory area and the second shared memory area when the arrival count becomes equal to the count of the CPUs, and based on a result of the comparison, judging the processes executed by the CPUs.
摘要:
Some of the embodiments of the present disclosure provide a method for predicting, for a first virtual address, a first descriptor based at least in part on the one or more past descriptors associated with one or more past virtual addresses; and determining, for the first virtual address, a first physical address based at least in part on the predicted first descriptor. Other embodiments are also described and claimed.
摘要:
A method and apparatus for cache coherency sequencing implementation and an adaptive LLC access priority control is disclosed. One embodiment provides mechanisms to resolve last level cache access priority among multiple internal CMP cores, internal snoops and external snoops. Another embodiment provides mechanisms for implementing cache coherency in multi-core CMP system.
摘要:
In one embodiment, a processor comprises a core configured to execute a data cache block write instruction and an interface unit coupled to the core and to an interconnect on which the processor is configured to communicate. The core is configured to transmit a request to the interface unit in response to the data cache block write instruction. If the request is speculative, the interface unit is configured to issue a first transaction on the interconnect. On the other hand, if the request is non-speculative, the interface unit is configured to issue a second transaction on the interconnect. The second transaction is different from the first transaction. For example, the second transaction may be an invalidate transaction and the first transaction may be a probe transaction. In some embodiments, the processor may be in a system including the interconnect and one or more caching agents.
摘要:
A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having one or more local cache memories associated and operatively connected therewith. The method comprises providing a snoop filter device associated with each processing unit, each snoop filter device having a plurality of dedicated input ports for receiving snoop requests from dedicated memory writing sources in the multiprocessor computing environment. Each snoop filter device includes a plurality of parallel operating port snoop filters in correspondence with the plurality of dedicated input ports, each port snoop filter implementing one or more parallel operating sub-filter elements that are adapted to concurrently filter snoop requests received from respective dedicated memory writing sources and forward a subset of those requests to its associated processing unit.
摘要:
A cache control apparatus determines whether to adopt or not data acquired by a speculative fetch by monitoring a status of the speculative fetch which is a memory fetch request output before it becomes clear whether data requested by a CPU is stored in a cache of the CPU and time period obtained by adding up the time period from when the speculative fetch is output to when the speculative fetch reaches a memory controller and time period from completion of writing of data to a memory which is specified by a data write command that has been issued, before issuance of the speculative fetch, for the same address as that for which the speculative fetch is issued to when a response of the data write command is returned.
摘要:
An apparatus to resolve cache coherency is presented. In one embodiment, the apparatus includes a microprocessor comprising one or more processing cores. The apparatus also includes a probe speculative address file unit, coupled to a cache memory, comprising a plurality of entries. Each entry includes a timer and a tag associated with a memory line. The apparatus further includes control logic to determine whether to service an incoming probe based at least in part on a timer value.
摘要:
A method and apparatus for supporting cache coherency in a multiprocessor computing environment having multiple processing units, each processing unit having a local cache memory associated therewith. A snoop filter device is associated with each processing unit and includes at least one snoop filter primitive implementing filtering method based on usage of stream registers sets and associated stream register comparison logic. From the plurality of stream registers sets, at least one stream register set is active, and at least one stream register set is labeled historic at any point in time. In addition, the snoop filter block is operatively coupled with cache wrap detection logic whereby the content of the active stream register set is switched into a historic stream register set upon the cache wrap condition detection, and the content of at least one active stream register set is reset. Each filter primitive implements stream register comparison logic that determines whether a received snoop request is to be forwarded to the processor or discarded.
摘要:
Disclosed are a method, a system, and a program product for processing a data stream by accessing one or more hardware registers of a processor. In one or more embodiments, a first program instruction or subroutine can associate a hardware register of the processor with a data stream. With this association, the hardware register can be used as a stream head which can be used by multiple program instructions to access the data stream. In one or more embodiments, data from the data stream can be fetched automatically as needed and with one or more patterns which may include one or more start positions, one or more lengths, one or more strides, etc. to allow the cache to be populated with sufficient amounts of data to reduce memory latency and/or external memory bandwidth when executing an application which accesses the data stream through the one or more registers.
摘要:
A system and computer implemented method for storing of data in the memory of a computer system in order at a fast rate is provided. The method includes launching a first store to memory. A wait counter is initiated. A second store to memory is speculatively launched when the wait counter expires. The second store to memory is cancelled when the second store achieves coherency prior to the first store to memory.