High speed intelligent distributed control memory system
    1.
    发明授权
    High speed intelligent distributed control memory system 失效
    高速智能分布式控制存储系统

    公开(公告)号:US4731737A

    公开(公告)日:1988-03-15

    申请号:US860608

    申请日:1986-05-07

    摘要: A highspeed, intelligent, distributed control memory system is comprised of an array of modular, cascadable, integrated circuit devices, hereinafter referred to as "memory elements." Each memory element is further comprised of storage means, programmable on board processing ("distributed control") means and means for interfacing with both the host system and the other memory elements in the array utilizing a single shared bus. Each memory element of the array is capable of transferring (reading or writing) data between adjacent memory elements once per clock cycle. In addition, each memory element is capable of broadcasting data to all memory elements of the array once per clock cycle. This ability to asynchronously transfer data between the memory elements at the clock rate, using the distributed control, facilitates unburdening host system hardware and software from tasks more efficiently performed by the distributed control. As a result, the memory itself can, for example, perform such tasks as sorting and searching, even across memory element boundaries, in a manner which conserves, is faster and more efficient then using, host system resources.

    摘要翻译: 高速,智能的分布式控制存储器系统由一系列模块化,可级联的集成电路器件组成,以下称为“存储元件”。 每个存储元件还包括存储装置,可编程板载处理(“分布式控制”)装置和用于使用单个共享总线与主机系统和阵列中的其它存储器元件两者进行接口的装置。 阵列的每个存储元件能够在每个时钟周期之间传送(读取或写入)相邻存储器元件之间的数据。 此外,每个存储元件能够每时钟周期向阵列的所有存储元件广播数据一次。 这种使用分布式控制以时钟速率在存储器元件之间异步传输数据的能力有助于从分布式控制更有效地执行的任务中减轻主机系统硬件和软件的负担。 因此,内存本身例如可以以保存的方式,甚至跨存储器元素边界执行诸如排序和搜索之类的任务,因此使用主机系统资源更快,更有效率。

    Self-regulating clock generator
    2.
    发明授权
    Self-regulating clock generator 失效
    自调节时钟发生器

    公开(公告)号:US5059818A

    公开(公告)日:1991-10-22

    申请号:US532311

    申请日:1990-06-01

    IPC分类号: G06F1/06 G06F1/08

    CPC分类号: G06F1/08

    摘要: There is disclosed a self-regulating clock generator for providing an output clock signal to clock a CMOS microprocessor. The output clock signal has first and second phases of sufficient length to accommodate microprocessor speed paths and is provided in response to an input clock signal having a frequency and a duty cycle within a wide range of frequencies and duty cycles. The clock generator includes a latch arranged to be set and reset by the input clock signal and having an output for providing the output clock signal. A delay circuit is coupled to the latch output and enables the setting and resetting of the latch to establish the phase lengths. Also disclosed is a second clock generator which includes a pair of latches and a pair of delay circuits for providing an output clock signal having first and second phases of different lengths.

    Prefetch instruction specifying destination functional unit and read/write access mode
    4.
    发明授权
    Prefetch instruction specifying destination functional unit and read/write access mode 有权
    指定目的地功能单元和读/写访问模式的预取指令

    公开(公告)号:US06321326B1

    公开(公告)日:2001-11-20

    申请号:US09569102

    申请日:2000-05-10

    申请人: David B. Witt

    发明人: David B. Witt

    IPC分类号: G06F1500

    摘要: A microprocessor is configured to execute a prefetch instruction specifying a cache line to be transferred into the microprocessor, as well as an access mode for the cache line. The microprocessor includes caches optimized for the access modes. In one embodiment, the microprocessor includes functional units configured to operate upon various data type. Each different type of functional unit may be connected to different caches which are optimized for the various access modes. The prefetch instruction may include a functional unit specification in addition to the access mode. In this manner, data of a particular type may be prefetched into a cache local to a particular functional unit.

    摘要翻译: 微处理器被配置为执行指定要传送到微处理器中的高速缓存行的预取指令以及高速缓存行的访问模式。 微处理器包括针对访问模式优化的缓存。 在一个实施例中,微处理器包括被配置为在各种数据类型上操作的功能单元。 每个不同类型的功能单元可以连接到为各种访问模式而优化的不同高速缓存。 除了访问模式之外,预取指令还可以包括功能单元规范。 以这种方式,特定类型的数据可以被预取到特定功能单元本地的高速缓存中。

    Register renaming in which moves are accomplished by swapping tags
    5.
    发明授权
    Register renaming in which moves are accomplished by swapping tags 有权
    注册重命名,通过交换标签来完成哪些移动

    公开(公告)号:US06256721B1

    公开(公告)日:2001-07-03

    申请号:US09595726

    申请日:2000-06-16

    申请人: David B. Witt

    发明人: David B. Witt

    IPC分类号: G06F938

    摘要: An apparatus for accelerating move operations includes a lookahead unit which detects move instructions prior to the execution of the move instructions (e.g. upon selection of the move operations for dispatch within a processor). Upon detecting a move instruction, the lookahead unit signals a register rename unit, which reassigns the rename register associated with the source register to the destination register. In one particular embodiment, the lookahead unit attempts to accelerate moves from a base pointer register to a stack pointer register (and vice versa). An embodiment of the lookahead unit generates lookahead values for the stack pointer register by maintaining cumulative effects of the increments and decrements of previously dispatched instructions. The cumulative effects of the increments and decrements prior to a particular instruction may be added to a previously generated value of the stack pointer register to generate a lookahead value for that particular instruction. For such an embodiment, reassigning the rename register as described above may thereby provide a valid value for the stack pointer register, and hence may allow for the generation of lookahead stack pointer values for instructions subsequent to the move instruction to proceed prior to execution of the move instruction. The present embodiment of the register rename unit may also assign the destination rename register selected for the move instruction to the source register of the move instruction (i.e. the rename tags for the source and destination are “swapped”).

    摘要翻译: 用于加速移动操作的装置包括在执行移动指令之前(例如,在选择用于在处理器内进行调度的移动操作)之前检测移动指令的前视单元。 在检测到移动指令时,先行单元发送寄存器重命名单元,该单元将与源寄存器相关联的重命名寄存器重新分配给目的地寄存器。 在一个特定实施例中,前瞻单元尝试加速从基本指针寄存器到堆栈指针寄存器的移动(反之亦然)。 前瞻单元的实施例通过维持先前分派的指令的增量和减量的累积效应来生成堆栈指针寄存器的前置值。 在特定指令之前的增量和减量的累积效应可以被添加到堆栈指针寄存器的先前产生的值以产生该特定指令的前瞻值。 对于这样的实施例,如上所述重新分配重命名寄存器可以由此为堆栈指针寄存器提供有效值,因此可以允许生成用于在执行移动指令之前的移动指令之后的指令的前瞻堆栈指针值 移动指令。 寄存器重命名单元的本实施例还可以将为移动指令选择的目的地重命名寄存器分配给移动指令的源寄存器(即,源和目的地的重命名标签被“交换”)。

    Linearly addressable microprocessor cache
    6.
    发明授权
    Linearly addressable microprocessor cache 失效
    线性可寻址微处理器缓存

    公开(公告)号:US06240484B1

    公开(公告)日:2001-05-29

    申请号:US08971805

    申请日:1997-11-17

    申请人: David B. Witt

    发明人: David B. Witt

    IPC分类号: G06F1210

    CPC分类号: G06F12/1063

    摘要: A microprocessor conforming to the X86 architecture is disclosed which includes a linearly addressable cache, thus allowing the cache to be quickly accessed by an external bus while allowing fast translation to a logical address for operation with functional units of microprocessor. Also disclosed is a microprocessor which includes linear tag array and a physical tag array corresponding to the linear tag array, thus allowing the contents of a microprocessor cache to be advantageously monitored from an external bus without slowing the main instruction and data access processing paths.

    摘要翻译: 公开了符合X86架构的微处理器,其包括线性可寻址高速缓存,从而允许通过外部总线快速访问高速缓存,同时允许快速转换为逻辑地址以与微处理器的功能单元一起操作。 还公开了一种微处理器,其包括对应于线性标签阵列的线性标签阵列和物理标签阵列,从而允许从外部总线有利地监视微处理器高速缓存的内容,而不会减慢主指令和数据访问处理路径。

    Universal dependency vector/queue entry
    7.
    发明授权
    Universal dependency vector/queue entry 有权
    通用依赖向量/队列条目

    公开(公告)号:US06212623B1

    公开(公告)日:2001-04-03

    申请号:US09139178

    申请日:1998-08-24

    申请人: David B. Witt

    发明人: David B. Witt

    IPC分类号: C06F1500

    摘要: A processor employs an instruction queue and dependency vectors therein which allow a flexible dependency recording structure. The dependency vector includes a dependency indication for each instruction queue entry, which may provide a universal mechanism for scheduling instruction operations. An arbitrary number of dependencies may be recorded for a given instruction operation, up to a dependency upon each other instruction operation. Since the dependency vector is configured to record an arbitrary number of dependencies, a given instruction operation can be ordered with respect to any other instruction operation. Accordingly, any architectural or microarchitectural restrictions upon concurrent execution or upon order of particular instruction operations in execution may be enforced. The instruction queues evaluate the dependency vectors and request scheduling for each instruction operation for which the recorded dependencies have been satisfied.

    摘要翻译: 处理器采用允许柔性依赖记录结构的指令队列和依赖性向量。 依赖向量包括每个指令队列条目的依赖指示,其可以提供用于调度指令操作的通用机制。 对于给定的指令操作可以记录任意数量的依赖性,直到彼此指示操作的依赖性。 由于依赖向量被配置为记录任意数量的依赖性,所以可以针对任何其他指令操作来排序给定的指令操作。 因此,可以执行对执行中的并行执行或特定指令操作的命令的任何架构或微架构限制。 指令队列对依赖向量进行评估,并对已经满足记录依赖关系的每个指令操作请求调度。

    Pipelined data cache with multiple ports and processor with load/store unit selecting only load or store operations for concurrent processing
    8.
    发明授权
    Pipelined data cache with multiple ports and processor with load/store unit selecting only load or store operations for concurrent processing 失效
    具有多个端口和处理器的流水线数据高速缓存,加载/存储单元仅选择用于并发处理的加载或存储操作

    公开(公告)号:US06202139B1

    公开(公告)日:2001-03-13

    申请号:US09100291

    申请日:1998-06-19

    IPC分类号: G06F1300

    摘要: A computer system includes a processor having a cache which includes multiple ports, although a storage array included within the cache may employ fewer physical ports than the cache supports. The cache is pipelined and operates at a clock frequency higher than that employed by the remainder of a microprocessor including the cache. In one embodiment, the cache preferably operates at a clock frequency which is at least a multiple of the clock frequency at which the remainder of the microprocessor operates. The multiple is equal to the number of ports provided on the cache (or the ratio of the number of ports provided on the cache to the number of ports provided internally, if more than one port is supported internally). Accordingly, the accesses provided on each port of the cache during a clock cycle of the microprocessor clock can be sequenced into the cache pipeline prior to commencement of the subsequent clock cycle. In one particular embodiment, the load/store unit of the microprocessor is configured to select only load memory operations or only store memory operations for concurrent presentation to the data cache. Accordingly, the data cache may be performing only reads or only writes to its internal array during a clock cycle. The data cache may implement several techniques for accelerating access time based upon this feature. For example, the bit lines within the data cache array may be only balanced between accesses instead of precharging (and potentially balancing).

    摘要翻译: 计算机系统包括具有包括多个端口的高速缓存的处理器,尽管包含在高速缓存内的存储阵列可以采用比缓存支持更少的物理端口。 高速缓冲存储器是流水线的,并以比包括高速缓存的微处理器的其余部分所采用的时钟频率更高的时钟频率工作。 在一个实施例中,高速缓存优选地以至少为微处理器的其余部分工作的时钟频率的倍数的时钟频率操作。 该倍数等于缓存上提供的端口数(或缓存上提供的端口数与内部提供的端口数之间的比例,如果内部支持多个端口)。 因此,在微处理器时钟的时钟周期期间,在高速缓存的每个端口上提供的访问可以在随后的时钟周期开始之前被排序到高速缓存流水线中。 在一个具体实施例中,微处理器的加载/存储单元被配置为仅选择加载存储器操作或仅存储用于并发呈现到数据高速缓存的存储器操作。 因此,在时钟周期期间,数据高速缓存可以仅执行读取或仅执行对其内部阵列的写入。 基于该特征,数据高速缓存可以实现几种用于加速访问时间的技术。 例如,数据高速缓存阵列中的位线可以仅在访问之间进行平衡,而不是预充电(和潜在的平衡)。

    Selecting cache to fetch in multi-level cache system based on fetch address source and pre-fetching additional data to the cache for future access
    9.
    发明授权
    Selecting cache to fetch in multi-level cache system based on fetch address source and pre-fetching additional data to the cache for future access 失效
    选择缓存以在多级缓存系统中基于获取地址源进行提取,并将其他数据预取到缓存以供将来访问

    公开(公告)号:US06199154B1

    公开(公告)日:2001-03-06

    申请号:US09099984

    申请日:1998-06-19

    申请人: David B. Witt

    发明人: David B. Witt

    IPC分类号: G06F906

    摘要: A processor employs a first instruction cache, a second instruction cache, and a fetch unit employing a fetch/prefetch method among the first and second instruction caches designed to provide high fetch bandwidth. The fetch unit selects a fetch address based upon previously fetched instructions (e.g. the existence or lack thereof of branch instructions within the previously fetched instructions) from a variety of fetch address sources. Depending upon the source of the fetch address, the fetch address is presented to one of the first and second instruction caches for fetching the corresponding instructions. If the first cache is selected to receive the fetch address, the fetch unit may select a prefetch address for presentation to the second cache. The prefetch address is selected from a variety of prefetch address sources and is presented to the second instruction cache. Instructions prefetched in response to the prefetch address are provided to the first instruction cache for storage. In one embodiment, the first instruction cache may be a low latency, relatively small cache while the second instruction cache may be a higher latency, relatively large cache. Fetch addresses from many of the fetch address sources may be likely to hit in the first instruction cache. Other fetch addresses may be less likely to hit in the first instruction cache. Accordingly, these fetch addresses may be immediately fetched from the second instruction cache, instead of first attempting to fetch from the first instruction cache.

    摘要翻译: 处理器采用第一指令高速缓存,第二指令高速缓存以及采用提取/预取方法的第一和第二指令高速缓冲存储器中的提取单元,该第一和第二指令高速缓冲存储器被设计为提供高取样带宽。 提取单元基于先前获取的指令(例如,先前获取的指令中存在或不存在分支指令)从各种提取地址源中选择提取地址。 取决于获取地址的来源,将取出地址呈现给第一和第二指令高速缓冲存储器之一,用于取出相应的指令。 如果选择第一高速缓存以接收取指地址,则提取单元可以选择用于呈现给第二高速缓存的预取地址。 预取地址从各种预取地址源中选择并被呈现给第二指令高速缓存。 响应于预取地址预取的指令被提供给第一指令高速缓存用于存储。 在一个实施例中,第一指令高速缓存可以是低等待时间,相对小的高速缓存,而第二指令高速缓存可以是较高等待时间的相对较大的高速缓存。 从多个获取地址源获取地址可能会在第一个指令高速缓存中命中。 其他提取地址可能不太可能在第一指令高速缓存中命中。 因此,可以从第二指令高速缓存中立即取出这些提取地址,而不是首先尝试从第一指令高速缓存取出。

    Fully associate cache employing LRU groups for cache replacement and
mechanism for selecting an LRU group
    10.
    发明授权
    Fully associate cache employing LRU groups for cache replacement and mechanism for selecting an LRU group 失效
    使用LRU组完全关联高速缓存替换和选择LRU组的机制

    公开(公告)号:US6161167A

    公开(公告)日:2000-12-12

    申请号:US884435

    申请日:1997-06-27

    申请人: David B. Witt

    发明人: David B. Witt

    IPC分类号: G06F9/38 G06F12/08 G06F12/12

    摘要: A microprocessor employs an L0 cache. The L0 cache is located physically near the execute units of the microprocessor and is relatively small in size as compared to a larger L1 data cache included within the microprocessor. The L0 cache is accessed for those memory operations for which an address is being conveyed to a Load/store unit within the microprocessor during the clock cycle in which the memory operation is selected for access to the L1 data cache. The address corresponding to the memory operation is received by the L0 cache directly from the execute unit forming the address. If a hit in the L0 cache is detected, the L0 cache either forwards data or stores data corresponding to the memory operation (depending upon the type of the memory operation). The memory operation is conveyed to the L1 data cache in parallel with the memory operation accessing the L0 cache. If the memory operation misses in the L0 cache and hits in the L1 data cache, the cache line corresponding to the memory operation may be conveyed to the L0 cache as a line fill.

    摘要翻译: 微处理器采用L0缓存。 L0高速缓存位于微处理器的执行单元的物理附近,与微处理器内包含的较大的L1数据高速缓存相比,尺寸相对较小。 在选择存储器操作以访问L1数据高速缓存的时钟周期期间,为存储器操作访问L0缓存,其中地址被传送到微处理器内的加载/存储单元。 与存储器操作相对应的地址由L0缓存直接从形成地址的执行单元接收。 如果检测到L0缓存中的命中,则L0高速缓存转发数据或存储对应于存储器操作的数据(取决于存储器操作的类型)。 存储器操作与访问L0高速缓存的存储器操作并行地传送到L1数据高速缓存。 如果存储器操作在L0高速缓存中丢失并且在L1数据高速缓存中命中,则与存储器操作相对应的高速缓存行可以作为行填充被传送到L0高速缓存。