Pipelined computer with operand context queue to simplify
context-dependent execution flow
    1.
    发明授权
    Pipelined computer with operand context queue to simplify context-dependent execution flow 失效
    具有操作数上下文队列的流水线计算机,以简化与上下文相关的执行流程

    公开(公告)号:US5542058A

    公开(公告)日:1996-07-30

    申请号:US317427

    申请日:1994-10-04

    IPC分类号: F02B75/02 G06F9/38 G06F9/30

    摘要: A macropipelined microprocessor chip adheres to strict read and write ordering by sequentially buffering operands in queues during instruction decode, then removing the operands in order during instruction execution. Any instruction that requires additional access to memory inserts the requests into the queued sequence (in a specifier queue) such that read and write ordering is preserved. A specifier queue synchronization counter captures synchronization points to coordinate memory request operations among the autonomous instruction decode unit, instruction execution unit, and memory sub-system. The synchronization method does not restrict the benefit of overlapped execution in the pipelined. Another feature is treatment of a variable bit field operand type that does not restrict the location of operand data. Instruction execution flows in a pipelined processor having such an operand type are vastly different depending on whether operand data resides in registers or memory. Thus, an operand context queue (field queue) is used to simplify context-dependent execution flow and increase overlap. The field queue allows the instruction decode unit to issue instructions with variable bit field operands normally, sequentially identifying and fetching operands, and communicating the operand context that specifies register or memory residence across the pipeline boundaries to the autonomous execution unit. The mechanism creates opportunity for increasing the overlap of pipelined functions and greatly simplifies the splitting of execution flows.

    摘要翻译: 宏指令微处理器芯片通过在指令解码期间依次缓冲队列中的操作数,然后在指令执行期间依次移除操作数,从而遵循严格的读写顺序。 任何需要对内存进行访问的指令将请求插入排队的序列(在指定符队列中),以便保留读写顺序。 指定符队列同步计数器捕获同步点以协调自主指令解码单元,指令执行单元和存储器子系统之间的存储器请求操作。 同步方法不限制流水线重叠执行的好处。 另一个特征是处理不限制操作数数据位置的可变位字段操作数类型。 具有这种操作数类型的流水线处理器中的指令执行流程根据操作数数据位于寄存器或存储器中而大不相同。 因此,操作数上下文队列(字段队列)用于简化上下文相关的执行流程并增加重叠。 字段队列允许指令解码单元通常发送具有可变位字段操作数的指令,顺序地识别和取出操作数,以及将指定流水线边界的寄存器或存储器驻留的操作数上下文传送到自主执行单元。 该机制为增加流水线功能的重叠创造了机会,并大大简化了执行流程的拆分。

    RESOURCE MANAGEMENT SUBSYSTEM THAT MAINTAINS FAIRNESS AND ORDER
    2.
    发明申请
    RESOURCE MANAGEMENT SUBSYSTEM THAT MAINTAINS FAIRNESS AND ORDER 有权
    资源管理子系统维护公平和秩序

    公开(公告)号:US20130311999A1

    公开(公告)日:2013-11-21

    申请号:US13476791

    申请日:2012-05-21

    IPC分类号: G06F9/50

    CPC分类号: G06F9/5011 G06F2209/507

    摘要: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.

    摘要翻译: 本公开的一个实施例阐述了在与重放操作相关的公共资源访问请求的调度中维持公平性和顺序的有效方式。 具体地说,流式多处理器(SM)包括配置成通过一个或多个执行周期调度访问请求的总顺序队列(TOQ)。 访问请求被允许在需要时将共同资源分配给该请求来进行进展。 在多个访问请求需要相同的公共资源的情况下,优先级被赋予较旧的访问请求。 访问请求可能处于睡眠状态,等待某些公共资源的可用性。 可以通过允许较旧的访问请求从较年轻的资源请求中窃取资源来避免死锁。 所公开的技术的一个优点是较旧的公共资源访问请求不被重复阻止以通过较新的访问请求提前进展。

    System and method for cleaning dirty data in a cache via frame buffer logic
    3.
    发明授权
    System and method for cleaning dirty data in a cache via frame buffer logic 有权
    通过帧缓冲区逻辑清理缓存中的脏数据的系统和方法

    公开(公告)号:US08341358B1

    公开(公告)日:2012-12-25

    申请号:US12562989

    申请日:2009-09-18

    IPC分类号: G06F13/00

    CPC分类号: G06F12/0846 G06F12/0804

    摘要: One embodiment of the invention sets forth a mechanism for efficiently write dirty data from the L2 cache to a DRAM. A dirty data notification, including a memory address of the dirty data, is transmitted by the L2 cache to a frame buffer logic when dirty data is stored in the L2 cache. The frame buffer logic uses a page-stream sorter to organize dirty data notifications based on the bank page associated with the memory addresses included in the dirty data notifications. The page-stream sorter includes multiple sets with entries that may be associated with different bank pages in the DRAM. The frame buffer logic transmits dirty data associated with an entry that has a maximum threshold of dirty data notifications to the DRAM. The frame buffer logic also transmits dirty data associated with the oldest entry when the number of entries in a set reaches a maximum threshold.

    摘要翻译: 本发明的一个实施例提出了一种用于将有害数据从L2高速缓存写入DRAM的机制。 当脏数据存储在L2高速缓存中时,包含脏数据的存储器地址的脏数据通知由L2高速缓存发送到帧缓冲器逻辑。 帧缓冲器逻辑使用页面流排序器来基于与包含在脏数据通知中的存储器地址相关联的存储体页来组织脏数据通知。 页面流分类器包括具有与DRAM中的不同存储体页面相关联的条目的多个集合。 帧缓冲器逻辑将具有与脏数据通知的最大阈值的条目相关联的脏数据发送到DRAM。 当一组中的条目数达到最大阈值时,帧缓冲器逻辑还发送与最早条目相关联的脏数据。

    Configurable cache occupancy policy
    4.
    发明授权
    Configurable cache occupancy policy 有权
    可配置缓存占用策略

    公开(公告)号:US08131931B1

    公开(公告)日:2012-03-06

    申请号:US12256378

    申请日:2008-10-22

    IPC分类号: G06F12/00

    CPC分类号: G06F12/121

    摘要: One embodiment of the invention is a method for evicting data from an intermediary cache that includes the steps of receiving a command from a client, determining that there is a cache miss relative to the intermediary cache, identifying one or more cache lines within the intermediary cache to store data associated with the command, determining whether any of data residing in the one or more cache lines includes raster operations data or normal data, and causing the data residing in the one or more cache lines to be evicted or stalling the command based, at least in part, on whether the data includes raster operations data or normal data. Advantageously, the method allows a series of cache eviction policies based on how cached data is categorized and the eviction classes of the data. Consequently, more optimized eviction decisions may be made, leading to fewer command stalls and improved performance.

    摘要翻译: 本发明的一个实施例是一种用于从中间缓存中取出数据的方法,包括以下步骤:从客户机接收命令,确定相对于中间缓存存在高速缓存未命中,识别中间缓存内的一个或多个高速缓存行 存储与所述命令相关联的数据,确定驻留在所述一个或多个高速缓存行中的数据中的任何一个是否包括光栅操作数据或正常数据,以及使驻留在所述一个或多个高速缓存行中的数据被驱逐或停止所述命令, 至少部分地关于数据是否包括光栅操作数据或正常数据。 有利地,该方法允许基于缓存数据被分类和数据的逐出类别的一系列缓存驱逐策略。 因此,可以进行更优化的驱逐决定,导致更少的命令停顿和改进的性能。

    System, method and frame buffer logic for evicting dirty data from a cache using counters and data types
    5.
    发明授权
    System, method and frame buffer logic for evicting dirty data from a cache using counters and data types 有权
    使用计数器和数据类型从缓存中排除脏数据的系统,方法和帧缓冲区逻辑

    公开(公告)号:US08060700B1

    公开(公告)日:2011-11-15

    申请号:US12330469

    申请日:2008-12-08

    IPC分类号: G06F13/00 G06F12/12

    摘要: A system and method for cleaning dirty data in an intermediate cache are disclosed. A dirty data notification, including a memory address and a data class, is transmitted by a level 2 (L2) cache to frame buffer logic when dirty data is stored in the L2 cache. The data classes include evict first, evict normal and evict last. In one embodiment, data belonging to the evict first data class is raster operations data with little reuse potential. The frame buffer logic uses a notification sorter to organize dirty data notifications, where an entry in the notification sorter stores the DRAM bank page number, a first count of cache lines that have resident dirty data and a second count of cache lines that have resident evict_first dirty data associated with that DRAM bank. The frame buffer logic transmits dirty data associated with an entry when the first count reaches a threshold.

    摘要翻译: 公开了一种用于清除中间缓存中的脏数据的系统和方法。 当脏数据存储在L2高速缓存中时,包含存储器地址和数据类的脏数据通知由级别2(L2)高速缓存发送到帧缓冲器逻辑。 数据类包括先驱逐出,最后逐出。 在一个实施例中,属于第一数据类别的数据是具有很少重用潜力的光栅操作数据。 帧缓冲器逻辑使用通知排序器来组织脏数据通知,其中通知分类器中的条目存储DRAM存储体页面编号,具有驻留脏数据的高速缓存行的第一计数和具有居民驱逐器的第一高速缓存行计数 与该DRAM库相关联的脏数据。 当第一个计数达到阈值时,帧缓冲器逻辑发送与条目相关联的脏数据。

    Memory addressing controlled by PTE fields
    6.
    发明授权
    Memory addressing controlled by PTE fields 有权
    由PTE字段控制的存储器寻址

    公开(公告)号:US07805587B1

    公开(公告)日:2010-09-28

    申请号:US11555628

    申请日:2006-11-01

    IPC分类号: G06F9/34 G06F12/00

    CPC分类号: G06F12/10 G06F12/0607

    摘要: Embodiments of the present invention enable virtual-to-physical memory address translation using optimized bank and partition interleave patterns to improve memory bandwidth by distributing data accesses over multiple banks and multiple partitions. Each virtual page has a corresponding page table entry that specifies the physical address of the virtual page in linear physical address space. The page table entry also includes a data kind field that is used to guide and optimize the mapping process from the linear physical address space to the DRAM physical address space, which is used to directly access one or more DRAM. The DRAM physical address space includes a row, bank and column address. The data kind field is also used to optimize the starting partition number and partition interleave pattern that defines the organization of the selected physical page of memory within the DRAM memory system.

    摘要翻译: 本发明的实施例使得能够使用优化的存储体和分区交织模式进行虚拟到物理存储器地址转换,以通过在多个存储体和多个分区上分配数据访问来提高存储器带宽。 每个虚拟页面都有一个对应的页表项,它指定了线性物理地址空间中的虚拟页面的物理地址。 页表条目还包括数据类型字段,用于引导和优化从线性物理地址空间到用于直接访问一个或多个DRAM的DRAM物理地址空间的映射处理。 DRAM物理地址空间包括一行,一行和一列地址。 数据类型字段还用于优化起始分区号和分区交织模式,其定义DRAM存储器系统内存储器的选定物理页面的组织。

    Mapping memory partitions to virtual memory pages
    7.
    发明授权
    Mapping memory partitions to virtual memory pages 有权
    将内存分区映射到虚拟内存页面

    公开(公告)号:US07620793B1

    公开(公告)日:2009-11-17

    申请号:US11467679

    申请日:2006-08-28

    摘要: Systems and methods for addressing memory using non-power-of-two virtual memory page sizes improve graphics memory bandwidth by distributing graphics data for efficient access during rendering. Various partition strides may be selected for each virtual memory page to modify the number of sequential addresses mapped to each physical memory partition and change the interleaving granularity. The addressing scheme allows for modification of a bank interleave pattern for each virtual memory page to reduce bank conflicts and improve memory bandwidth utilization. The addressing scheme also allows for modification of a partition interleave pattern for each virtual memory page to distribute accesses amongst multiple partitions and improve memory bandwidth utilization.

    摘要翻译: 使用非二功能虚拟内存页大小寻址内存的系统和方法通过在渲染过程中分配图形数据进行高效访问来提高图形内存带宽。 可以为每个虚拟存储器页面选择各种分段步长,以修改映射到每个物理存储器分区的顺序地址的数量并改变交织粒度。 寻址方案允许修改每个虚拟存储器页面的存储体交织模式以减少存储体冲突并提高存储器带宽利用率。 寻址方案还允许修改每个虚拟存储器页面的分区交织模式以分布多个分区之间的访问并提高存储器带宽利用率。

    Digital clock recovery circuit
    8.
    发明授权
    Digital clock recovery circuit 有权
    数字时钟恢复电路

    公开(公告)号:US07257183B2

    公开(公告)日:2007-08-14

    申请号:US10178902

    申请日:2002-06-21

    IPC分类号: H03D3/24 H03D3/18

    摘要: A clock recovery circuit includes a sampler for sampling a data signal. Logic determines whether a data edge lags or precedes a clock edge which drives the sampler, and provides early and late indications. A filter filters the early and late indications, and a phase controller adjusts the phase of the clock based on the filtered indications. Based on the filtered indications, a frequency estimator estimates the frequency difference between the data and clock, providing an input to the phase controller to further adjust the phase so as to continually correct for the frequency difference.

    摘要翻译: 时钟恢复电路包括用于采样数据信号的采样器。 逻辑确定数据边缘是否落在驱动采样器的时钟边缘之前,并提供早期和晚期指示。 滤波器过滤早期和晚期指示,相位控制器根据滤波指示调整时钟的相位。 基于经滤波的指示,频率估计器估计数据和时钟之间的频率差,向相位控制器提供输入以进一步调整相位,以便连续校正频率差。

    Method and apparatus for predicting memory dependence using store sets
    9.
    发明授权
    Method and apparatus for predicting memory dependence using store sets 失效
    使用存储集来预测存储器依赖性的方法和装置

    公开(公告)号:US06108770A

    公开(公告)日:2000-08-22

    申请号:US103984

    申请日:1998-06-24

    IPC分类号: G06F9/312 G06F9/38

    摘要: A method of scheduling program instructions for execution in a computer processor comprises fetching and holding instructions from an instruction memory and executing the fetched instructions out of program order. When load/store order violations are detected, the effects of the load operation and its dependent instructions are erased and they are re-executed. The load is associated with all stores on whose data the load depends. This collection of stores is called a store set. On a subsequent issuance of the load, its execution is delayed until any store in the load's store set has issued. Two loads may share a store set, and separate store sets are merged when a load from one store set is found to depend on a store from another store set. A preferred embodiment employs two tables. The first is a store set ID table (SSIT) which is indexed by part of, or a hash of, an instruction PC. Entries in the SSIT provide a store set ID which is used to index into the second table, which for each store set, contains a pointer to the last fetched, unexecuted store instruction.

    摘要翻译: 调度用于在计算机处理器中执行的程序指令的方法包括从指令存储器读取和保存指令,并从程序顺序执行所取出的指令。 当检测到加载/存储订单违规时,擦除负载操作及其相关指令的影响,并重新执行。 负载与负载所依赖的数据的所有存储相关联。 这个商店集合被称为商店集。 在随后的负载发放中,其执行被延迟,直到装载商店中的任何商店已经发出。 两个负载可以共享存储集,并且当发现来自一个存储集的负载依赖于来自另一个存储集的存储时,分离的存储集被合并。 优选实施例采用两个表。 第一个是由指令PC的一部分或散列构成索引的存储集ID表(SSIT)。 SSIT中的条目提供存储集ID,该存储集ID用于索引到第二表,对于每个存储集,对于每个存储集,ID包含指向最后获取的未执行存储指令的指针。