System and Method for Completing Full Updates to Entire Cache Lines Stores with Address-Only Bus Operations
    1.
    发明申请
    System and Method for Completing Full Updates to Entire Cache Lines Stores with Address-Only Bus Operations 有权
    完整的完整更新的系统和方法完整的缓存行存储仅地址总线操作

    公开(公告)号:US20080140943A1

    公开(公告)日:2008-06-12

    申请号:US12034769

    申请日:2008-02-21

    IPC分类号: G06F12/00

    CPC分类号: G06F12/0897 G06F12/0804

    摘要: A method and processor system that substantially eliminates data bus operations when completing updates of an entire cache line with a full store queue entry. The store queue within a processor chip is designed with a series of AND gates connecting individual bits of the byte enable bits of a corresponding entry. The AND output is fed to the STQ controller and signals when the entry is full. When full entries are selected for dispatch to the RC machines, the RC machine is signaled that the entry updates the entire cache line. The RC machine obtains write permission to the line, and then the RC machine overwrites the entire cache line. Because the entire cache line is overwritten, the data of the cache line is not retrieved when the request for the cache line misses at the cache or when data goes state before write permission is obtained by the RC machine.

    摘要翻译: 一种方法和处理器系统,其在完成具有完整存储队列条目的整个高速缓存行的更新时基本上消除数据总线操作。 处理器芯片内的存储队列设计有连接相应条目的字节使能位的各个位的一系列与门。 AND输出被馈送到STQ控制器,并在条目已满时发出信号。 当选择完整条目以发送到RC机器时,RC机器发出信号,表示该条目更新整个高速缓存行。 RC机器获得线路的写入权限,然后RC机器覆盖整个高速缓存行。 由于整个高速缓存线被覆盖,当缓存线的请求在高速缓存中丢失或数据在RC写入权限获得之前状态时,不会检索高速缓存行的数据。

    System and method for completing updates to entire cache lines with address-only bus operations
    2.
    发明授权
    System and method for completing updates to entire cache lines with address-only bus operations 有权
    使用仅地址总线操作完成对整个高速缓存行的更新的系统和方法

    公开(公告)号:US07360021B2

    公开(公告)日:2008-04-15

    申请号:US10825189

    申请日:2004-04-15

    IPC分类号: G06F12/12

    CPC分类号: G06F12/0897 G06F12/0804

    摘要: A method and processor system that substantially eliminates data bus operations when completing updates of an entire cache line with a full store queue entry. The store queue within a processor chip is designed with a series of AND gates connecting individual bits of the byte enable bits of a corresponding entry. The AND output is fed to the STQ controller and signals when the entry is full. When full entries are selected for dispatch to the RC machines, the RC machine is signaled that the entry updates the entire cache line. The RC machine obtains write permission to the line, and then the RC machine overwrites the entire cache line. Because the entire cache line is overwritten, the data of the cache line is not retrieved when the request for the cache line misses at the cache or when data goes state before write permission is obtained by the RC machine.

    摘要翻译: 一种方法和处理器系统,其在完成具有完整存储队列条目的整个高速缓存行的更新时基本上消除数据总线操作。 处理器芯片内的存储队列设计有连接相应条目的字节使能位的各个位的一系列与门。 AND输出被馈送到STQ控制器,并在条目已满时发出信号。 当选择完整条目以发送到RC机器时,RC机器发出信号,表示该条目更新整个高速缓存行。 RC机器获得线路的写入权限,然后RC机器覆盖整个高速缓存行。 由于整个高速缓存线被覆盖,当缓存线的请求在高速缓存中丢失时或在RC机器获得写入许可之前数据进入状态时,不会检索高速缓存行的数据。

    System and method for completing full updates to entire cache lines stores with address-only bus operations
    3.
    发明授权
    System and method for completing full updates to entire cache lines stores with address-only bus operations 有权
    使用仅地址总线操作完成对整个高速缓存行存储的完全更新的系统和方法

    公开(公告)号:US07493446B2

    公开(公告)日:2009-02-17

    申请号:US12034769

    申请日:2008-02-21

    IPC分类号: G06F12/12

    CPC分类号: G06F12/0897 G06F12/0804

    摘要: A method and processor system that substantially eliminates data bus operations when completing updates of an entire cache line with a full store queue entry. The store queue within a processor chip is designed with a series of AND gates connecting individual bits of the byte enable bits of a corresponding entry. The AND output is fed to the STQ controller and signals when the entry is full. When full entries are selected for dispatch to the RC machines, the RC machine is signaled that the entry updates the entire cache line. The RC machine obtains write permission to the line, and then the RC machine overwrites the entire cache line. Because the entire cache line is overwritten, the data of the cache line is not retrieved when the request for the cache line misses at the cache or when data goes state before write permission is obtained by the RC machine.

    摘要翻译: 一种方法和处理器系统,其在完成具有完整存储队列条目的整个高速缓存行的更新时基本上消除数据总线操作。 处理器芯片内的存储队列设计有连接相应条目的字节使能位的各个位的一系列与门。 AND输出被馈送到STQ控制器,并在条目已满时发出信号。 当选择完整条目以发送到RC机器时,RC机器发出信号,表示该条目更新整个高速缓存行。 RC机器获得线路的写入权限,然后RC机器覆盖整个高速缓存行。 由于整个高速缓存线被覆盖,当缓存线的请求在高速缓存中丢失时或在RC机器获得写入许可之前数据进入状态时,不会检索高速缓存行的数据。

    System and method for enabling weak consistent storage advantage to a firmly consistent storage architecture
    4.
    发明授权
    System and method for enabling weak consistent storage advantage to a firmly consistent storage architecture 失效
    系统和方法能够使稳定的存储优势与稳定的存储架构相结合

    公开(公告)号:US06963967B1

    公开(公告)日:2005-11-08

    申请号:US09588508

    申请日:2000-06-06

    IPC分类号: G06F9/00 G06F9/30 G06F9/38

    摘要: Disclosed is a method of processing instructions in a data processing system. An instruction sequence that includes a memory access instruction is received at a processor in program order. In response to receipt of the memory access instruction a memory access request and a barrier operation are created. The barrier operation is placed on an interconnect after the memory access request is issued to a memory system. After the barrier operation has completed, the memory access request is completed in program order. When the memory access request is a load request, the load request is speculatively issued if a barrier operation is pending. Data returned by the speculatively issued load request is only returned to a register or execution unit of the processor when an acknowledgment is received for the barrier operation.

    摘要翻译: 公开了一种在数据处理系统中处理指令的方法。 包括存储器访问指令的指令序列以处理器的顺序被接收。 响应于接收到存储器访问指令,创建存储器访问请求和屏障操作。 在将存储器访问请求发布到存储器系统之后,屏障操作被放置在互连上。 屏障操作完成后,按程序顺序完成内存访问请求。 当存储器访问请求是加载请求时,如果屏障操作正在等待,则推测性地发出加载请求。 当接收到用于屏障操作的确认时,由推测发出的加载请求返回的数据仅返回到处理器的寄存器或执行单元。

    High performance symmetric multiprocessing systems via super-coherent data mechanisms
    5.
    发明授权
    High performance symmetric multiprocessing systems via super-coherent data mechanisms 失效
    通过超相干数据机制的高性能对称多处理系统

    公开(公告)号:US06785774B2

    公开(公告)日:2004-08-31

    申请号:US09978362

    申请日:2001-10-16

    IPC分类号: G06F1200

    CPC分类号: G06F12/0831

    摘要: A multiprocessor data processing system comprising a plurality of processing units, a plurality of caches, that is each affiliated with one of the processing units, and processing logic that, responsive to a receipt of a first system bus response to a coherency operation, causes the requesting processor to execute operations utilizing super-coherent data. The data processing system further includes logic eventually returning to coherent operations with other processing units responsive to an occurrence of a pre-determined condition. The coherency protocol of the data processing system includes a first coherency state that indicates that modification of data within a shared cache line of a second cache of a second processor has been snooped on a system bus of the data processing system. When the cache line is in the first coherency state, subsequent requests for the cache line is issued as a Z1 read on a system bus and one of two responses are received. If the response to the Z1 read indicates that the first processor should utilize local data currently available within the cache line, the first coherency state is changed to a second coherency state that indicates to the first processor that subsequent request for the cache line should utilize the data within the local cache and not be issued to the system interconnect. Coherency state transitions to the second coherency state is completed via the coherency protocol of the data processing system. Super-coherent data is provided to the processor from the cache line of the local cache whenever the second coherency state is set for the cache line and a request is received.

    摘要翻译: 一种多处理器数据处理系统,包括多个处理单元,多个高速缓存,每个高速缓存与每个处理单元中的一个相关联;以及处理逻辑,响应于对一致性操作的第一系统总线响应的接收,使得 请求处理器使用超相干数据执行操作。 数据处理系统还包括逻辑,其最终返回到响应于预定条件的发生的其他处理单元的相干操作。 数据处理系统的一致性协议包括第一相关性状态,其指示在数据处理系统的系统总线上已经窥探第二处理器的第二高速缓存的共享高速缓存行内的数据的修改。 当高速缓存行处于第一相关性状态时,在系统总线上作为Z1读取发出对高速缓存行的后续请求,并且接收到两个响应中的一个。 如果对Z1读取的响应指示第一处理器应利用高速缓存行内当前可用的本地数据,则将第一相关性状态改变为第二相关性状态,其向第一处理器指示对高速缓存行的后续请求应当利用 本地缓存内的数据,不发给系统互连。 通过数据处理系统的一致性协议完成一致性状态转换到第二相关性状态。 每当为高速缓存行设置第二相关性状态并接收到请求时,将超相干数据从本地高速缓存行提供给处理器。

    System and method for asynchronously overlapping storage barrier operations with old and new storage operations
    6.
    发明授权
    System and method for asynchronously overlapping storage barrier operations with old and new storage operations 有权
    使用旧的和新的存储操作异步重叠存储屏障操作的系统和方法

    公开(公告)号:US06609192B1

    公开(公告)日:2003-08-19

    申请号:US09588607

    申请日:2000-06-06

    IPC分类号: G06F9312

    摘要: Disclosed is a multiprocessor data processing system that executes loads transactions out of order with respect to a barrier operation. The data processing system includes a memory and a plurality of processors coupled to an interconnect. At least one of the processors includes an instruction sequencing unit for fetching an instruction sequence in program order for execution. The instruction sequence includes a first and a second load instruction and a barrier instruction, which is between the first and second load instructions in the instruction sequence. Also included in the processor is a load/store unit (LSU), which has a load request queue (LRQ) that temporarily buffers load requests associated with the first and second load instructions. The LRQ is coupled to a load request arbitration unit, which selects an order of issuing the load requests from the LRQ. Then a controller issues a load request associated with the second load instruction to memory before completion of a barrier operation associated with the barrier instruction. Alternatively, load requests are issued out-of-order with respect to the program order before or after the barrier instruction. The load request arbitration unit selects the request associated with the second load instruction before a request associated with the first load instruction, and the controller issues the request associated with the second load instruction before the request associated with the first load instruction and before issuing the barrier operation.

    摘要翻译: 公开了一种多处理器数据处理系统,其针对屏障操作执行无序的负载事务。 数据处理系统包括存储器和耦合到互连的多个处理器。 至少一个处理器包括用于以程序顺序取出指令序列以执行的指令排序单元。 指令序列包括在指令序列中的第一和第二加载指令之间的第一和第二加载指令和障碍指令。 还包括在处理器中的是装载/存储单元(LSU),其具有临时缓冲与第一和第二加载指令相关联的加载请求的加载请求队列(LRQ)。 LRQ耦合到负载请求仲裁单元,该单元从LRQ中选择发出负载请求的顺序。 然后,在与障碍指令相关联的屏障操作完成之前,控制器向存储器发出与第二加载指令相关联的加载请求。 或者,负载请求在屏障指令之前或之后相对于程序顺序发出无序。 负载请求仲裁单元在与第一加载指令相关联的请求之前选择与第二加载指令相关联的请求,并且控制器在与第一加载指令相关联的请求之前发布与第二加载指令相关联的请求,并且在发布屏障之前 操作。

    Speculative execution of instructions and processes before completion of preceding barrier operations
    7.
    发明授权
    Speculative execution of instructions and processes before completion of preceding barrier operations 失效
    完成前面的障碍操作之前,对指令和过程的推测执行

    公开(公告)号:US06880073B2

    公开(公告)日:2005-04-12

    申请号:US09753053

    申请日:2000-12-28

    IPC分类号: G06F9/30 G06F9/38 G06F9/00

    摘要: Described is a data processing system and processor that provides full multiprocessor speculation by which all instructions subsequent to barrier operations in a instruction sequence are speculatively executed before the barrier operation completes on the system bus. The processor comprises a load/store unit (LSU) with a barrier operation (BOP) controller that permits load instructions subsequent to syncs in an instruction sequence to be speculatively issued prior to the return of the sync acknowledgment. Data returned is immediately forwarded to the processor's execution units. The returned data and results of subsequent operations are held temporarily in rename registers. A multiprocessor speculation flag is set in the corresponding rename registers to indicate that the value is “barrier” speculative. When a barrier acknowledge is received by the BOP controller, the flag(s) of the corresponding rename register(s) are reset.

    摘要翻译: 描述了提供完整的多处理器推测的数据处理系统和处理器,在系统总线上的屏障操作完成之前,推测性地执行指令序列中的屏障操作之后的所有指令。 处理器包括具有屏障操作(BOP)控制器的加载/存储单元(LSU),其允许在指令序列中的同步之后的加载指令在返回同步确认之前被推测地发出。 返回的数据立即转发到处理器的执行单元。 返回的数据和后续操作的结果暂时保存在重命名寄存器中。 在相应的重命名寄存器中设置多处理器推测标志,以指示该值为“屏障”推测。 当BOP控制器接收到屏障确认时,相应的重命名寄存器的标志被重置。

    System and method for providing multiprocessor speculation within a speculative branch path
    8.
    发明授权
    System and method for providing multiprocessor speculation within a speculative branch path 失效
    在推测性分支路径中提供多处理器推测的系统和方法

    公开(公告)号:US06728873B1

    公开(公告)日:2004-04-27

    申请号:US09588507

    申请日:2000-06-06

    IPC分类号: G06F9312

    摘要: Disclosed is a method of operation within a processor, that enhances speculative branch processing. A speculative execution path contains an instruction sequence that includes a barrier instruction followed by a load instruction. While a barrier operation associated with the barrier instruction is pending, a load request associated with the load instruction is speculatively issued to memory. A flag is set for the load request when it is speculatively issued and reset when an acknowledgment is received for the barrier operation. Data which is returned by the speculatively issued load request is temporarily held and forwarded to a register or execution unit of the data processing system after the acknowledgment is received. All process results, including data returned by the speculatively issued load instructions are discarded when the speculative execution path is determined to be incorrect.

    摘要翻译: 公开了一种处理器内的操作方法,其增强了推测性分支处理。 推测执行路径包含指令序列,其中包含跟随加载指令的障碍指令。 当与障碍指令相关联的障碍操作正在等待时,与加载指令相关联的加载请求被推测地发布到存储器。 当推测性地发出加载请求时设置标志,并且当接收到用于屏障操作的确认时,重置该标志。 在接收到确认之后,由推测发出的加载请求返回的数据被暂时保存并转发到数据处理系统的寄存器或执行单元。 当推测性执行路径被确定为不正确时,所有处理结果(包括由推测发出的加载指令返回的数据)被丢弃。

    Mechanism for folding storage barrier operations in a multiprocessor system
    9.
    发明授权
    Mechanism for folding storage barrier operations in a multiprocessor system 失效
    在多处理器系统中折叠存储屏障操作的机制

    公开(公告)号:US06725340B1

    公开(公告)日:2004-04-20

    申请号:US09588509

    申请日:2000-06-06

    IPC分类号: G06F9312

    摘要: Disclosed is a processor that reduces barrier operations during instruction processing. An instruction sequence includes a first barrier instruction and a second barrier instruction with a store instruction in between the first and second barrier instructions. A store request associated with the store instruction is issued prior to a barrier operation associated with the first barrier instruction. A determination is made of when the store request completes before the first barrier instruction has issued. In response, only a single barrier operation is issued for both the first and second barrier instructions. The single barrier operation is issued after the store request has been issued and at the time the second barrier operation is scheduled to be issued.

    摘要翻译: 公开了一种在指令处理期间减少屏障操作的处理器。 指令序列包括在第一和第二屏障指令之间具有存储指令的第一屏障指令和第二屏障指令。 在与第一屏障指令相关联的屏障操作之前发出与存储指令相关联的存储请求。 确定存储请求何时在第一个屏障指令发出之前完成。 作为响应,仅为第一和第二屏障指令发出单个屏障操作。 单个屏障操作在存储请求已经被发出之后并且在第二屏障操作被安排发布的时候发出。

    Dynamic hardware and software performance optimizations for super-coherent SMP systems
    10.
    发明授权
    Dynamic hardware and software performance optimizations for super-coherent SMP systems 失效
    超连贯SMP系统的动态硬件和软件性能优化

    公开(公告)号:US06704844B2

    公开(公告)日:2004-03-09

    申请号:US09978361

    申请日:2001-10-16

    IPC分类号: G06F1210

    CPC分类号: G06F12/0831

    摘要: A method for increasing performance optimization in a multiprocessor data processing system. A number of predetermined thresholds are provided within a system controller logic and utilized to trigger specific bandwidth utilization responses. Both an address bus and data bus bandwidth utilization are monitored. Responsive to a fall of a percentage of data bus bandwidth utilization below a first predetermined threshold value, the system controller provides a particular response to a request for a cache line at a snooping processor having the cache line, where the response indicates to a requesting processor that the cache line will be provided. Conversely, if the percentage of data bus bandwidth utilization rises above a second predetermined threshold value, the system controller provides a next response to the request that indicates to any requesting processors that the requesting processor should utilize super-coherent data which is currently within its local cache. Similar operation on the address bus permits the system controller to triggering the issuing of Z1 Read requests for modified data in a shared cache line by processors which still have super-coherent data. The method also comprises enabling a load instruction with a plurality of bits that (1) indicates whether a resulting load request may receive super-coherent data and (2) overrides a coherency state indicating utilization of super-coherent data when said plurality of bits indicates that said load request may not utilize said super-coherent data. Specialized store instructions with appended bits and related functionality are also provided.

    摘要翻译: 一种用于在多处理器数据处理系统中提高性能优化的方法。 在系统控制器逻辑中提供多个预定阈值,并用于触发特定带宽利用响应。 监视地址总线和数据总线带宽利用率。 响应于低于第一预定阈值的百分比的数据总线带宽利用率的下降,系统控制器在具有高速缓存行的窥探处理器处提供对高速缓存行的请求的特定响应,其中响应向请求处理器指示 将提供缓存行。 相反,如果数据总线带宽利用率的百分比上升到高于第二预定阈值,则系统控制器向请求处理器提供对请求的下一个响应,该请求指示请求处理器应该利用当前在其本地内的超相干数据 缓存。 地址总线上的类似操作允许系统控制器通过仍具有超相干数据的处理器触发在共享高速缓存行中发出对于修改数据的Z1读请求。 该方法还包括启用具有多个位的加载指令,其中(1)指示所产生的加载请求是否可以接收超相干数据,以及(2)当所述多个比特指示时,超过表示超相干数据的利用的相关性状态 所述加载请求可能不利用所述超相干数据。 还提供了具有附加位和相关功能的专用存储指令。