摘要:
A circuit arrangement and method utilize texture data prefetching to prefetch texture data used by an anisotropic filtering algorithm. In particular, stride-based prefetching may be used to prefetch texture data for use in anisotropic filtering, where the value of the stride, or difference between successive accesses, is based upon a distance in a memory address space between sample points taken along the line of anisotropy used in an anisotropic filtering algorithm.
摘要:
Ring packet built-in self-test (PBIST) circuitry configured to detect errors in wires connecting a ring of superconducting chips includes circuitry configured to make the PBIST immune to interchip latency and still allow the PBIST to test a stop-to-stop connection. By making a PBIST independent of latency, an entire ring can be characterized for latency and for its bit-error rate prior to running any functional test. Such systems and associated methods can be scaled to larger platforms having any number of ring stops. The PBIST circuitry can function as either transmitter or receiver, or both, to test an entire ring. The PBIST can also be used to tune clocks in the ring to achieve the lowest overall bit error rate (BER) in the ring.
摘要:
A method and circuit arrangement utilize scan logic disposed on a multi-core processor integrated circuit device or chip to perform internal voting-based built in self test (BIST) of the chip. Test patterns are generated internally on the chip and communicated to the scan chains within multiple processing cores on the chip. Test results output by the scan chains are compared with one another on the chip, and majority voting is used to identify outlier test results that are indicative of a faulty processing core. A bit position in a faulty test result may be used to identify a faulty latch in a scan chain and/or a faulty functional unit in the faulty processing core, and a faulty processing core and/or a faulty functional unit may be automatically disabled in response to the testing.
摘要:
A method and apparatus for transferring architected state bypasses system memory by directly transmitting architected state between processor cores over a dedicated interconnect. The transfer may be performed by state transfer interface circuitry with or without software interaction. The architected state for a thread may be transferred from a first processing core to a second processing core when the state transfer interface circuitry detects an error that prevents proper execution of the thread corresponding to the architected state. A program instruction may be used to initiate the transfer of the architected state for the thread to one or more other threads in order to parallelize execution of the thread or perform load balancing between multiple processor cores by distributing processing of multiple threads.
摘要:
A network on chip (‘NOC’) including integrated processor (‘IP’) blocks, routers, memory communications controllers, and network interface controller, wherein the memory communications controller configured to execute a memory access instruction and configured to determine a state of a cache line addressed by the memory access instruction, the state of the cache line being one of shared, exclusive, or invalid; the memory communications controller configured to broadcast an invalidate command to a plurality of IP blocks of the NOC if the state of the cache line is shared; and the memory communications controller configured to transmit an invalidate command only to an IP block that controls a cache where the cache line is stored if the state of the cache line is exclusive.
摘要:
Ring packet built-in self-test (PBIST) circuitry configured to detect errors in wires connecting a ring of superconducting chips includes circuitry configured to make the PBIST immune to interchip latency and still allow the PBIST to test a stop-to-stop connection. By making a PBIST independent of latency, an entire ring can be characterized for latency and for its bit-error rate prior to running any functional test. Such systems and associated methods can be scaled to larger platforms having any number of ring stops. The PBIST circuitry can function as either transmitter or receiver, or both, to test an entire ring. The PBIST can also be used to tune clocks in the ring to achieve the lowest overall bit error rate (BER) in the ring.
摘要:
A method and apparatus for transferring architected state bypasses system memory by directly transmitting architected state between processor cores over a dedicated interconnect. The transfer may be performed by state transfer interface circuitry with or without software interaction. The architected state for a thread may be transferred from a first processing core to a second processing core when the state transfer interface circuitry detects an error that prevents proper execution of the thread corresponding to the architected state. A program instruction may be used to initiate the transfer of the architected state for the thread to one or more other threads in order to parallelize execution of the thread or perform load balancing between multiple processor cores by distributing processing of multiple threads.
摘要:
A method and apparatus dynamically allocates and deallocates a portion of a cache for use as a dedicated local storage. Cache lines may be dynamically allocated and deallocated for inclusion in the dedicated local storage. Cache entries that are included in the dedicated local storage may not be evicted or invalidated. Additionally, coherence is not maintained between the cache entries that are included in the dedicated local storage and the backing memory. A load instruction may be configured to allocate, e.g., lock, a portion of the data cache for inclusion in the dedicated local storage and load data into the dedicated local storage. A load instruction may be configured to read data from the dedicated local storage and to deallocate, e.g., unlock, a portion of the data cache that was included in the dedicated local storage.
摘要:
A technique for managing hard failures in a memory system employing a locking is disclosed. An error count is maintained for units of memory within the memory system. When the error count indicates a hard failure, the unit of memory is locked out from further use. An arbitrary set of error counters are assigned to record errors resulting from access to the units of memory. Embodiments of the present invention advantageously enable a system to continue reliable operation even after one or more internal hard memory failures. Other embodiments advantageously enable manufacturers to salvage partially failed devices and deploy the devices as having a lower-performance specification rather than discarding the devices, as would otherwise be indicated by conventional practice.
摘要:
A circuit arrangement and method bypass the storage of requested data in a higher level cache of a multi-level memory architecture during the return of the requested data to a requester, while caching the requested data in a lower level cache. For certain types of data, e.g., data that is only used once and/or that is rarely modified or written back to memory, bypassing storage in a higher level cache reduces the likelihood of the requested data casting out frequently used data from the higher level cache. By caching the data in a lower level cache, however, the lower level cache can still snoop data requests and return requested data in the event the data is already cached in the lower level cache.