Optimized cache allocation algorithm for multiple speculative requests
    51.
    发明授权
    Optimized cache allocation algorithm for multiple speculative requests 失效
    针对多个推测请求的优化缓存分配算法

    公开(公告)号:US06393528B1

    公开(公告)日:2002-05-21

    申请号:US09345714

    申请日:1999-06-30

    IPC分类号: G06F1200

    CPC分类号: G06F12/0862 G06F12/127

    摘要: A method of operating a computer system is disclosed in which an instruction having an explicit prefetch request is issued directly from an instruction sequence unit to a prefetch unit of a processing unit. In a preferred embodiment, two prefetch units are used, the first prefetch unit being hardware independent and dynamically monitoring one or more active streams associated with operations carried out by a core of the processing unit, and the second prefetch unit being aware of the lower level storage subsystem and sending with the prefetch request an indication that a prefetch value is to be loaded into a lower level cache of the processing unit. The invention may advantageously associate each prefetch request with a stream ID of an associated processor stream, or a processor ID of the requesting processing unit (the latter feature is particularly useful for caches which are shared by a processing unit cluster). If another prefetch value is requested from the memory hiearchy and it is determined that a prefetch limit of cache usage has been met by the cache, then a cache line in the cache containing one of the earlier prefetch values is allocated for receiving the other prefetch value.

    摘要翻译: 公开了一种操作计算机系统的方法,其中具有显式预取请求的指令直接从指令序列单元发送到处理单元的预取单元。 在优选实施例中,使用两个预取单元,第一预取单元是硬件独立的,并且动态地监视与由处理单元的核心执行的操作相关联的一个或多个活动流,并且第二预取单元知道较低级别 存储子系统,并用预取请求发送将预取值加载到处理单元的较低级缓存中的指示。 本发明可以有利地将每个预取请求与相关联的处理器流的流ID或请求处理单元的处理器ID相关联(后一特征对于由处理单元簇共享的高速缓存特别有用)。 如果从存储器hiearchy请求另一个预取值,并且确定高速缓存的高速缓存使用的预取限制已被满足,则包含先前预取值中的一个的高速缓存行中的高速缓存行被分配用于接收另一个预取值 。

    Extended cache state with prefetched stream ID information
    52.
    发明授权
    Extended cache state with prefetched stream ID information 失效
    扩展缓存状态与预取流ID信息

    公开(公告)号:US06360299B1

    公开(公告)日:2002-03-19

    申请号:US09345644

    申请日:1999-06-30

    IPC分类号: G06F1200

    摘要: A method of operating a computer system is disclosed in which an instruction having an explicit prefetch request is issued directly from an instruction sequence unit to a prefetch unit of a processing unit. In a preferred embodiment, two prefetch units are used, the first prefetch unit being hardware independent and dynamically monitoring one or more active streams associated with operations carried out by a core of the processing unit, and the second prefetch unit being aware of the lower level storage subsystem and sending with the prefetch request an indication that a prefetch value is to be loaded into a lower level cache of the processing unit. The invention may advantageously associate each prefetch request with a stream ID of an associated processor stream, or a processor ID of the requesting processing unit (the latter feature is particularly useful for caches which are shared by a processing unit cluster). If another prefetch value is requested from the memory hierarchy, and it is determined that a prefetch limit of cache usage has been met by the cache, then a cache line in the cache containing one of the earlier prefetch values is allocated for receiving the other prefetch value.

    摘要翻译: 公开了一种操作计算机系统的方法,其中具有显式预取请求的指令直接从指令序列单元发送到处理单元的预取单元。 在优选实施例中,使用两个预取单元,第一预取单元是硬件独立的,并且动态地监视与由处理单元的核心执行的操作相关联的一个或多个活动流,并且第二预取单元知道较低级别 存储子系统,并用预取请求发送将预取值加载到处理单元的较低级缓存中的指示。 本发明可以有利地将每个预取请求与相关联的处理器流的流ID或请求处理单元的处理器ID相关联(后一特征对于由处理单元簇共享的高速缓存特别有用)。 如果从存储器层次结构请求另一个预取值,并且确定高速缓存的高速缓存使用的预取限制已经被高速缓存满足,则分配包含较早预取值之一的高速缓存行中的高速缓存行用于接收另一个预取 值。

    Integrated purge store mechanism to flush L2/L3 cache structure for improved reliabity and serviceability
    53.
    发明授权
    Integrated purge store mechanism to flush L2/L3 cache structure for improved reliabity and serviceability 有权
    集成的清除存储机制来刷新L2 / L3缓存结构,以提高可靠性和可维护性

    公开(公告)号:US07055002B2

    公开(公告)日:2006-05-30

    申请号:US10424486

    申请日:2003-04-25

    IPC分类号: G06F13/00

    CPC分类号: G06F12/0804 G06F12/0897

    摘要: A method of reducing errors in a cache memory of a computer system (e.g., an L2 cache) by periodically issuing a series of purge commands to the L2 cache, sequentially flushing cache lines from the L2 cache to an L3 cache in response to the purge commands, and correcting errors (single-bit) in the cache lines as they are flushed to the L3 cache. Purge commands are issued only when the processor cores associated with the L2 cache have an idle cycle available in a store pipe to the cache. The flush rate of the purge commands can be programmably set, and the purge mechanism can be implemented either in software running on the computer system, or in hardware integrated with the L2 cache. In the case of the software, the purge mechanism can be incorporated into the operating system. In the case of hardware, a purge engine can be provided which advantageously utilizes the store pipe that is provided between the L1 and L2 caches. The L2 cache can be forced to victimize cache lines, by setting tag bits for the cache lines to a value that misses in the L2 cache (e.g., cache-inhibited space). With the eviction mechanism of the cache placed in a direct-mapped mode, the address misses will result in eviction of the cache lines, thereby flushing them to the L3 cache.

    摘要翻译: 通过周期性地向L2高速缓存发出一系列清除命令来减少计算机系统(例如,L2高速缓存)的高速缓冲存储器中的错误的方法,响应于清除,将缓存行从L2高速缓存刷新到L3高速缓存 命令和纠正高速缓存行中的错误(单位),因为它们被刷新到L3高速缓存。 清除命令仅在与L2缓存关联的处理器核心具有可用于缓存的存储管道中的空闲周期时发出。 清除命令的刷新速率可以可编程设置,并且清除机制可以在计算机系统上运行的软件中,也可以在与L2缓存集成的硬件中实现。 在软件的情况下,可以将清除机构并入操作系统。 在硬件的情况下,可以提供有利地利用设置在L1和L2高速缓存之间的存储管道的清洗引擎。 通过将高速缓存行的标记位设置为L2高速缓存中缺少的值(例如,禁止高速缓存的空间),L2高速缓存可能被迫使高速缓存行受害。 由于高速缓存的驱逐机制处于直接映射模式,地址未命中将导致高速缓存线的驱逐,从而将它们刷新到L3高速缓存。

    System and method for reducing contention in a multi-sectored cache
    54.
    发明授权
    System and method for reducing contention in a multi-sectored cache 失效
    用于减少多扇区高速缓存中的争用的系统和方法

    公开(公告)号:US06950909B2

    公开(公告)日:2005-09-27

    申请号:US10424645

    申请日:2003-04-28

    IPC分类号: G06F12/08 G06F12/00

    CPC分类号: G06F12/0817

    摘要: A cache access mechanism/system for reducing contention in a multi-sectored cache via serialization of overlapping write accesses to different blocks of a cache line to enable accurate cache directory updates. When a first queue issues a write access request for a first block of a cache line, the first queue concurrently asserts a last_in_line signal identifying the first queue as the last sequential queue to request access to that cache line. If there is an active write access requests for the cache line, the first queue undertakes a series of operations to enable sequentially correct updates to the cache directory with all previous updates taken into consideration. Included in these operations are tracking the completion of the write access and the corresponding write to the associated cache directory and copying the cache directory state to be updated from the parent queue (rather than from the cache directory) so that the parent queue's update of the directory state is included (and not overwritten) when the first queue writes to the directory. The correct cache directory state is then stored within the associated cache directory.

    摘要翻译: 一种缓存访问机制/系统,用于通过对对高速缓存行的不同块的重叠写访问的串行化来减少多扇区高速缓存中的竞争,以实现精确的高速缓存目录更新。 当第一队列发出针对高速缓存行的第一块的写访问请求时,第一队列同时断定标识第一队列的last_in_line信号作为请求访问该高速缓存行的最后一个顺序队列。 如果存在针对高速缓存行的活动写入访问请求,则第一个队列进行一系列操作,以便能够对所有先前更新进行考虑的顺序更新到缓存目录。 这些操作中包括跟踪写访问的完成和对相关缓存目录的相应写入,并从父队列(而不是从缓存目录)复制要更新的缓存目录状态,以便父队列的更新 当第一个队列写入目录时,包含目录状态(而不是覆盖)。 然后将正确的缓存目录状态存储在关联的高速缓存目录中。

    Multi-level multiprocessor speculation mechanism
    55.
    发明授权
    Multi-level multiprocessor speculation mechanism 有权
    多级多处理器推测机制

    公开(公告)号:US06748518B1

    公开(公告)日:2004-06-08

    申请号:US09588483

    申请日:2000-06-06

    IPC分类号: G06F930

    摘要: Disclosed is a processor, which reduces issuing of unnecessary barrier operations during instruction processing. The processor comprises an instruction sequencing unit and a load store unit (LSU) that issues a group of memory access requests that precede a barrier instruction in an instruction sequence. The processor also includes a controller, which in response to a determination that all of the memory access requests hit in a cache affiliated with the processor, withholds issuing on an interconnect a barrier operation associated with the barrier instruction. The controller further directs the load store unit to ignore the barrier instruction and complete processing of a next group of memory access requests following the barrier instruction in the instruction sequence without receiving an acknowledgment.

    摘要翻译: 公开了一种处理器,其减少在指令处理期间发出不必要的屏障操作。 处理器包括指令排序单元和负载存储单元(LSU),其发出在指令序列中的屏障指令之前的一组存储器访问请求。 处理器还包括控制器,其响应于确定在处理器附属的高速缓存中的所有存储器访问请求,在互连上保留与屏障指令相关联的屏障操作。 控制器进一步引导加载存储单元忽略屏障指令,并且在指令序列中的屏障指令之后的下一组存储器访问请求完成处理而不接收到确认。

    Multiprocessor speculation mechanism with imprecise recycling of storage operations
    56.
    发明授权
    Multiprocessor speculation mechanism with imprecise recycling of storage operations 有权
    多处理器推测机制,存储操作不正确的回收

    公开(公告)号:US06606702B1

    公开(公告)日:2003-08-12

    申请号:US09588606

    申请日:2000-06-06

    IPC分类号: G06F9312

    摘要: Disclosed is a method of operating a processor, by which a speculatively issued load request, which fetches incorrect data, is recycled. An instruction sequence, which includes a barrier instruction and a load instruction that follows the barrier instruction in program order, is received for execution. In response to the barrier instruction, a barrier operation is issued on an interconnect. Following, in response to the load instruction and while the barrier operation is pending, a load request is issued to memory. When a pre-determined type of invalidate, which is affiliated with the load request, is received before the receipt of an acknowledgment for the barrier operation, data that is returned by memory in response to the load request is discarded and the load request is re-issued. The pre-determined type of invalidate includes, for example, a snoop invalidate.

    摘要翻译: 公开了一种操作处理器的方法,通过该方法,回收了推测性发出的载入请求,其提取不正确的数据。 接收指令序列,其中包括按程序顺序跟随障碍指令的障碍指令和加载指令,以执行。 响应于屏障指令,在互连上发出屏障操作。 接下来,响应于加载指令,并且当屏障操作正在等待时,向存储器发出加载请求。 当在接收到屏障操作的确认之前接收到与加载请求相关联的预定类型的无效时,丢弃由存储器响应于加载请求而返回的数据,并且重新加载请求 -发行。 预定类型的无效包括例如窥探无效。

    Speculative execution of instructions and processes before completion of preceding barrier operations
    57.
    发明授权
    Speculative execution of instructions and processes before completion of preceding barrier operations 失效
    完成前面的障碍操作之前,对指令和过程的推测执行

    公开(公告)号:US06880073B2

    公开(公告)日:2005-04-12

    申请号:US09753053

    申请日:2000-12-28

    IPC分类号: G06F9/30 G06F9/38 G06F9/00

    摘要: Described is a data processing system and processor that provides full multiprocessor speculation by which all instructions subsequent to barrier operations in a instruction sequence are speculatively executed before the barrier operation completes on the system bus. The processor comprises a load/store unit (LSU) with a barrier operation (BOP) controller that permits load instructions subsequent to syncs in an instruction sequence to be speculatively issued prior to the return of the sync acknowledgment. Data returned is immediately forwarded to the processor's execution units. The returned data and results of subsequent operations are held temporarily in rename registers. A multiprocessor speculation flag is set in the corresponding rename registers to indicate that the value is “barrier” speculative. When a barrier acknowledge is received by the BOP controller, the flag(s) of the corresponding rename register(s) are reset.

    摘要翻译: 描述了提供完整的多处理器推测的数据处理系统和处理器,在系统总线上的屏障操作完成之前,推测性地执行指令序列中的屏障操作之后的所有指令。 处理器包括具有屏障操作(BOP)控制器的加载/存储单元(LSU),其允许在指令序列中的同步之后的加载指令在返回同步确认之前被推测地发出。 返回的数据立即转发到处理器的执行单元。 返回的数据和后续操作的结果暂时保存在重命名寄存器中。 在相应的重命名寄存器中设置多处理器推测标志,以指示该值为“屏障”推测。 当BOP控制器接收到屏障确认时,相应的重命名寄存器的标志被重置。

    System and method for providing multiprocessor speculation within a speculative branch path
    58.
    发明授权
    System and method for providing multiprocessor speculation within a speculative branch path 失效
    在推测性分支路径中提供多处理器推测的系统和方法

    公开(公告)号:US06728873B1

    公开(公告)日:2004-04-27

    申请号:US09588507

    申请日:2000-06-06

    IPC分类号: G06F9312

    摘要: Disclosed is a method of operation within a processor, that enhances speculative branch processing. A speculative execution path contains an instruction sequence that includes a barrier instruction followed by a load instruction. While a barrier operation associated with the barrier instruction is pending, a load request associated with the load instruction is speculatively issued to memory. A flag is set for the load request when it is speculatively issued and reset when an acknowledgment is received for the barrier operation. Data which is returned by the speculatively issued load request is temporarily held and forwarded to a register or execution unit of the data processing system after the acknowledgment is received. All process results, including data returned by the speculatively issued load instructions are discarded when the speculative execution path is determined to be incorrect.

    摘要翻译: 公开了一种处理器内的操作方法,其增强了推测性分支处理。 推测执行路径包含指令序列,其中包含跟随加载指令的障碍指令。 当与障碍指令相关联的障碍操作正在等待时,与加载指令相关联的加载请求被推测地发布到存储器。 当推测性地发出加载请求时设置标志,并且当接收到用于屏障操作的确认时,重置该标志。 在接收到确认之后,由推测发出的加载请求返回的数据被暂时保存并转发到数据处理系统的寄存器或执行单元。 当推测性执行路径被确定为不正确时,所有处理结果(包括由推测发出的加载指令返回的数据)被丢弃。

    Mechanism for folding storage barrier operations in a multiprocessor system
    59.
    发明授权
    Mechanism for folding storage barrier operations in a multiprocessor system 失效
    在多处理器系统中折叠存储屏障操作的机制

    公开(公告)号:US06725340B1

    公开(公告)日:2004-04-20

    申请号:US09588509

    申请日:2000-06-06

    IPC分类号: G06F9312

    摘要: Disclosed is a processor that reduces barrier operations during instruction processing. An instruction sequence includes a first barrier instruction and a second barrier instruction with a store instruction in between the first and second barrier instructions. A store request associated with the store instruction is issued prior to a barrier operation associated with the first barrier instruction. A determination is made of when the store request completes before the first barrier instruction has issued. In response, only a single barrier operation is issued for both the first and second barrier instructions. The single barrier operation is issued after the store request has been issued and at the time the second barrier operation is scheduled to be issued.

    摘要翻译: 公开了一种在指令处理期间减少屏障操作的处理器。 指令序列包括在第一和第二屏障指令之间具有存储指令的第一屏障指令和第二屏障指令。 在与第一屏障指令相关联的屏障操作之前发出与存储指令相关联的存储请求。 确定存储请求何时在第一个屏障指令发出之前完成。 作为响应,仅为第一和第二屏障指令发出单个屏障操作。 单个屏障操作在存储请求已经被发出之后并且在第二屏障操作被安排发布的时候发出。

    Dynamic hardware and software performance optimizations for super-coherent SMP systems
    60.
    发明授权
    Dynamic hardware and software performance optimizations for super-coherent SMP systems 失效
    超连贯SMP系统的动态硬件和软件性能优化

    公开(公告)号:US06704844B2

    公开(公告)日:2004-03-09

    申请号:US09978361

    申请日:2001-10-16

    IPC分类号: G06F1210

    CPC分类号: G06F12/0831

    摘要: A method for increasing performance optimization in a multiprocessor data processing system. A number of predetermined thresholds are provided within a system controller logic and utilized to trigger specific bandwidth utilization responses. Both an address bus and data bus bandwidth utilization are monitored. Responsive to a fall of a percentage of data bus bandwidth utilization below a first predetermined threshold value, the system controller provides a particular response to a request for a cache line at a snooping processor having the cache line, where the response indicates to a requesting processor that the cache line will be provided. Conversely, if the percentage of data bus bandwidth utilization rises above a second predetermined threshold value, the system controller provides a next response to the request that indicates to any requesting processors that the requesting processor should utilize super-coherent data which is currently within its local cache. Similar operation on the address bus permits the system controller to triggering the issuing of Z1 Read requests for modified data in a shared cache line by processors which still have super-coherent data. The method also comprises enabling a load instruction with a plurality of bits that (1) indicates whether a resulting load request may receive super-coherent data and (2) overrides a coherency state indicating utilization of super-coherent data when said plurality of bits indicates that said load request may not utilize said super-coherent data. Specialized store instructions with appended bits and related functionality are also provided.

    摘要翻译: 一种用于在多处理器数据处理系统中提高性能优化的方法。 在系统控制器逻辑中提供多个预定阈值,并用于触发特定带宽利用响应。 监视地址总线和数据总线带宽利用率。 响应于低于第一预定阈值的百分比的数据总线带宽利用率的下降,系统控制器在具有高速缓存行的窥探处理器处提供对高速缓存行的请求的特定响应,其中响应向请求处理器指示 将提供缓存行。 相反,如果数据总线带宽利用率的百分比上升到高于第二预定阈值,则系统控制器向请求处理器提供对请求的下一个响应,该请求指示请求处理器应该利用当前在其本地内的超相干数据 缓存。 地址总线上的类似操作允许系统控制器通过仍具有超相干数据的处理器触发在共享高速缓存行中发出对于修改数据的Z1读请求。 该方法还包括启用具有多个位的加载指令,其中(1)指示所产生的加载请求是否可以接收超相干数据,以及(2)当所述多个比特指示时,超过表示超相干数据的利用的相关性状态 所述加载请求可能不利用所述超相干数据。 还提供了具有附加位和相关功能的专用存储指令。