Decoupled architecture for data ciphering operations
    51.
    发明授权
    Decoupled architecture for data ciphering operations 有权
    用于数据加密操作的解耦架构

    公开(公告)号:US07305567B1

    公开(公告)日:2007-12-04

    申请号:US10234895

    申请日:2002-09-04

    IPC分类号: G06F11/30 H04L9/32 H04L9/00

    CPC分类号: H04L9/0643 H04L2209/125

    摘要: In one embodiment, an apparatus comprises a microcontroller unit to store instructions into an execution queue. The apparatus also comprises an execution queue unit to generate a widely decoded functional execution instruction based on at least one instruction stored in the execution queue. Additionally, the apparatus comprises a functional unit to execute the widely decoded functional execution instruction asynchronous to the generation of the widely decoded functional execution instruction.

    摘要翻译: 在一个实施例中,一种装置包括用于将指令存储到执行队列中的微控制器单元。 该装置还包括执行队列单元,用于基于存储在执行队列中的至少一个指令来生成广泛解码的功能执行指令。 此外,该装置包括执行与生成广泛解码的功能执行指令异步的广泛解码的功能执行指令的功能单元。

    Scalable directory based cache coherence protocol
    52.
    发明授权
    Scalable directory based cache coherence protocol 失效
    基于可扩展目录的缓存一致性协议

    公开(公告)号:US06918015B2

    公开(公告)日:2005-07-12

    申请号:US10403922

    申请日:2003-03-31

    IPC分类号: G06F12/08 G06F12/00

    CPC分类号: G06F12/0817 G06F12/0828

    摘要: A system and method is disclosed to maintain the coherence of shared data in cache and memory contained in the nodes of a multiprocessing computer system. The distributed multiprocessing computer system contains a number of processors each connected to main memory. A processor in the distributed multiprocessing computer system is identified as a Home processor for a memory block if it includes the original memory block and a coherence directory for the memory block in its main memory. An Owner processor is another processor in the multiprocessing computer system that includes a copy of the Home processor memory block in a cache connected to its main memory. Whenever an Owner processor is present for a memory block, it is the only processor in the distributed multiprocessing computer system to contain a copy of the Home processor memory block. Eviction of a memory block copy held by an Owner processor in its cache requires a write of the memory block copy to its Home and update of the corresponding coherence directory. No reads of the Home processor directory or modification of other processor cache and main memory is required. The coherence controller in each processor is able to send and receive messages out of order to maintain the coherence of the shared data in cache and main memory. If an out of order message causes an incorrect next program state, the coherence controller is able to restore the prior correct saved program state and resume execution.

    摘要翻译: 公开了一种维护多处理计算机系统的节点中包含的高速缓存和存储器中的共享数据的一致性的系统和方法。 分布式多处理计算机系统包含多个处理器,每个处理器连接到主存储器。 分布式多处理计算机系统中的处理器被识别为用于存储器块的家庭处理器,如果其包括原始存储器块和用于其主存储器中的存储器块的一致性目录。 所有者处理器是多处理计算机系统中的另一个处理器,其包括连接到其主存储器的高速缓存中的家庭处理器存储器块的副本。 无论何时存在一个内存块的所有者处理器,它是分布式多处理计算机系统中唯一包含家庭处理器内存块副本的处理器。 驱逐由所有者处理器在其高速缓存中保存的存储器块副本需要将存储器块副本写入其归属并更新相应的一致性目录。 不需要读取家庭处理器目录或修改其他处理器缓存和主内存。 每个处理器中的相干控制器能够发送和接收消息,以保持缓存和主存储器中的共享数据的一致性。 如果故障信息导致下一个程序状态不正确,则相干控制器能够恢复先前正确保存的程序状态并恢复执行。

    Efficient translation lookaside buffer miss processing in computer systems with a large range of page sizes
    53.
    发明授权
    Efficient translation lookaside buffer miss processing in computer systems with a large range of page sizes 失效
    具有大范围页面大小的计算机系统中的高效翻译后备缓冲区丢失处理

    公开(公告)号:US06715057B1

    公开(公告)日:2004-03-30

    申请号:US09652552

    申请日:2000-08-31

    IPC分类号: G06F1210

    摘要: A system and method is disclosed to efficiently translate virtual-to-physical addresses of large size pages of data by eliminating one level of a multilevel page table. A computer system containing a processor includes a translation lookaside buffer (“TLB”) in the processor. The processor is connected to a system memory that contains a page table with multiple levels. The page table translates the virtual address of a page of data stored in system memory into the corresponding physical address of the page of data. If the size of the page is above a certain threshold value, then translation of the page using the multilevel page table occurs by eliminating one or more levels of the page table. The threshold value preferably is 512 Megabytes. The multilevel page table is only used for translation of the virtual address of the page of data stored in system memory into the corresponding physical address of the page of data if a lookup of the TLB for the virtual address of the page of data results in a miss. The TLB also contains entries from the final level of the page table (i.e., physical addresses of pages of data) corresponding to a subfield of bits from corresponding virtual addresses of the page of data. Virtual-to-physical address translation using the multilevel page table is not required if the TLB contains the needed physical address of the page of data corresponding to the subfield of bits from the virtual address of the page of data.

    摘要翻译: 公开了一种系统和方法,通过消除多级页表的一个级别来有效地转换大尺寸数据页的虚拟到物理地址。 包含处理器的计算机系统包括处理器中的翻译后备缓冲器(“TLB”)。 处理器连接到包含具有多个级别的页表的系统内存。 页表将存储在系统存储器中的数据页的虚拟地址转换为数据页面的相应物理地址。 如果页面的大小高于某个阈值,则通过消除页面表的一个或多个级别,发生使用多级页面表的页面的翻译。 阈值最好是512兆字节。 多级页表仅用于将存储在系统存储器中的数据页的虚拟地址转换为数据页面的相应物理地址,如果查找数据页的虚拟地址的TLB导致 小姐。 TLB还包含对应于数据页面的相应虚拟地址的比特的子字段的页表的最后级别(即,数据页的物理地址)的条目。 如果TLB包含与数据页面的虚拟地址中的位的子字段对应的数据页面的所需物理地址,则不需要使用多级页表的虚拟到物理地址转换。

    Method and apparatus for implementing loop compression in a program counter trace
    54.
    发明授权
    Method and apparatus for implementing loop compression in a program counter trace 有权
    在程序计数器跟踪中实现循环压缩的方法和装置

    公开(公告)号:US06691207B2

    公开(公告)日:2004-02-10

    申请号:US10034506

    申请日:2001-12-28

    IPC分类号: G06F1200

    摘要: A system is disclosed in which an on-chip logic analyzer (OCLA) includes a loop detector logic which receives incoming program counter (PC) data and detects when software loops exist. When a software loop is detected, the loop detector may be configured to store the first loop in memory, while all subsequent iterations are not stored, thus saving space in memory which would otherwise be consumed. The loop detector comprises a content addressable memory (CAM) which is enabled by a user programmed signal. The CAM may be configured with a programmable mask to determine which bits of the incoming PC data to compare with the CAM entries. The depth of the CAM also is programmable, to permit the CAM to be adjusted to cover the number of instructions in a loop.

    摘要翻译: 公开了一种系统,其中片上逻辑分析器(OCLA)包括环路检测器逻辑,其接收输入的程序计数器(PC)数据并检测何时存在软件循环。 当检测到软件循环时,循环检测器可以被配置为将第一循环存储在存储器中,而所有后续迭代都不被存储,从而节省了否则将被消耗的存储器中的空间。 环路检测器包括由用户编程的信号使能的内容寻址存储器(CAM)。 CAM可以配置有可编程掩码,以确定进入的PC数据的哪些比特与CAM条目进行比较。 CAM的深度也是可编程的,以允许调整CAM以覆盖循环中的指令数。

    Special encoding of known bad data
    55.
    发明授权
    Special encoding of known bad data 有权
    已知不良数据的特殊编码

    公开(公告)号:US06662319B1

    公开(公告)日:2003-12-09

    申请号:US09652314

    申请日:2000-08-31

    IPC分类号: G06F1110

    CPC分类号: G06F11/0763 G06F11/0724

    摘要: A multi-processor system in which each processor receives a message from another processor in the system. The message may contain corrupted data that was corrupted during transmission from the preceding processor. Upon receiving the message, the processor detects that a portion of the message contains corrupted data. The processor then replaces the corrupted portion with a predetermined bit pattern known or otherwise programmed into all other processors in the system. The predetermined bit pattern indicates that the associated portion of data was corrupted. The processor that detects the error in the message preferably alerts the system that an error has been detected. The message now containing the predetermined bit pattern in place of the corrupted data is retransmitted to another processor. The predetermined bit pattern will indicate that an error in the message was detected by the previous processor. In response, the processor detecting the predetermined bit pattern preferably will not alert the system of the existence of an error. The same message with the predetermined bit pattern can be retransmitted to other processors which also will detect the presence of the predetermined bit pattern and in response not alert the system of the presence of an error. As such, because only the first processor to detect an error alerts the system of the error and because messages containing uncorrectable errors still are transmitted through the system, fault isolation is improved and the system is less likely to fall into a deadlock condition.

    摘要翻译: 一种多处理器系统,其中每个处理器从系统中的另一处理器接收消息。 消息可能包含在从前一个处理器传输过程中损坏的损坏的数据。 处理器收到消息后,检测到消息的一部分包含损坏的数据。 然后,处理器以已知或以其他方式编程到系统中的所有其他处理器的预定位模式来替换被破坏的部分。 预定位模式指示相关联的数据部分已损坏。 检测消息中的错误的处理器最好提醒系统检测到错误。 现在包含预定位模式以代替已损坏数据的消息被重新发送到另一个处理器。 预定的位模式将指示消息中的错误被先前的处理器检测到。 作为响应,优选地,检测预定位模式的处理器不会警告系统存在错误。 具有预定位模式的相同消息可以被重新发送到其他处理器,其也将检测预定位模式的存在,并且在响应时不向系统警告存在错误。 因此,由于只有第一个处理器检测错误才会使系统发生错误,并且由于包含不可校正错误的消息仍然通过系统传输,所以故障隔离得到改善,系统不太可能陷入死锁状态。

    Efficient address interleaving with simultaneous multiple locality options
    56.
    发明授权
    Efficient address interleaving with simultaneous multiple locality options 有权
    高效的地址交错与同时多地点选项

    公开(公告)号:US06567900B1

    公开(公告)日:2003-05-20

    申请号:US09652452

    申请日:2000-08-31

    IPC分类号: G06F1200

    CPC分类号: G06F12/0831

    摘要: A computer system includes multiple processors, each of which includes an associated memory. Each of the processors is capable of accessing the memory of all other processors. Memory can be stored and accessed using different addressing schemes. For data that will only be used by the local processor, data is stored in memory using processor contiguous addressing, so that data is stored in the local memory. For data that may be accessed by multiple processors, data is stored using striping among a local processor set. A stripe control register in the memory controller of each memory comprises a mask that indicates which memory blocks should be accessed using processor contiguous addressing and which should be accessed by using striped addressing. For both striped and contiguous addressing, the address space includes a processor identification field to identify the processor where the associated memory resides, together with an offset indicating where in memory the address is located. The processor identification field for striped addressing includes two bits located in low order address space identifying a four processor local stripe set. The other processor identification bits define which four processors comprise each stripe set.

    摘要翻译: 计算机系统包括多个处理器,每个处理器包括相关联的存储器。 每个处理器能够访问所有其他处理器的存储器。 可以使用不同的寻址方案来存储和访问存储器。 对于仅由本地处理器使用的数据,使用处理器连续寻址将数据存储在存储器中,使得数据存储在本地存储器中。 对于可能被多个处理器访问的数据,使用本地处理器集中的条带化来存储数据。 每个存储器的存储器控​​制器中的条带控制寄存器包括掩码,其指示应使用处理器连续寻址来访问哪些存储器块,并且应该通过使用条带寻址来访问。 对于条带和连续寻址,地址空间包括处理器标识字段,用于标识相关存储器所处的处理器,以及指示地址在存储器中的位置的偏移量。 用于条带寻址的处理器标识字段包括位于低位地址空间中的两个位,标识四处理器本地条带集。 其他处理器识别位定义哪个四个处理器包括每个条带集。

    Method and apparatus for balancing load vs. store access to a primary
data cache
    57.
    发明授权
    Method and apparatus for balancing load vs. store access to a primary data cache 有权
    用于平衡负载与对主数据高速缓存的存储访问的方法和装置

    公开(公告)号:US6163821A

    公开(公告)日:2000-12-19

    申请号:US215354

    申请日:1998-12-18

    摘要: A computer method and apparatus causes the load-store instruction grouping in a microprocessor instruction pipeline to be disrupted at appropriate times. The computer method and apparatus employs a memory access member which periodically stalls the issuance of store instructions when there are prior store instructions pending in the store queue. The periodic stalls bias the issue stage to issue load groups and store instruction groups. In the latter case, the store queue is free to update the data cache with the data from previous store instructions. Thus, the invention memory access member biases issuance of store instructions in a manner that prevents the store queue from becoming full, and as such enables the store queue to write to the data cache before the store queue becomes full.

    摘要翻译: 计算机方法和装置使得微处理器指令流水线中的加载存储指令分组在适当的时间被中断。 计算机方法和装置采用存储器访问部件,当在存储队列中存在先前的存储指令时,周期性地停止发布存储指令。 周期性档位偏离问题阶段以发布加载组并存储指令组。 在后一种情况下,存储队列可以使用来自先前存储指令的数据来自由地更新数据高速缓存。 因此,本发明的存储器访问部件以防止存储队列变满的方式偏移存储指令的发布,并且因此使存储队列在存储队列变满之前写入数据高速缓存。

    Stream buffers for high-performance computer memory system
    58.
    发明授权
    Stream buffers for high-performance computer memory system 失效
    流缓冲区用于高性能计算机内存系统

    公开(公告)号:US5761706A

    公开(公告)日:1998-06-02

    申请号:US333133

    申请日:1994-11-01

    IPC分类号: G06F12/08 G06F12/00

    摘要: Method and apparatus for a filtered stream buffer coupled to a memory and a processor, and operating to prefetch data from the memory. The filtered stream buffer includes a cache block storage area and a filter controller. The filter controller determines whether a pattern of references has a predetermined relationship, and if so, prefetches stream data into the cache block storage area. Such stream data prefetches are particularly useful in vector processing computers, where once the processor starts to fetch a vector, the addresses of future fetches can be predicted based in the pattern of past fetches. According to various aspects of the present invention, the filtered stream buffer further includes a history table, a validity indicator which is associated with the cache block storage area and indicates which cache blocks, if any, are valid. According to yet another aspect of the present invention, the filtered stream buffer controls random access memory (RAM) chips to stream the plurality of consecutive cache blocks from the RAM into the cache block storage area. According to yet another aspect of the present invention, the stream data includes data for a plurality of strided cache blocks, wherein each of which these strided cache blocks corresponds to an address determined by adding to the first address an integer multiple of the difference between the second address and the first address. According to yet another aspect of the present invention, the processor generates three addresses of data words in the memory, and the filter controller determines whether a predetermined relationship exists among three addresses, and if so, prefetches strided stream data into said cache block storage area.

    摘要翻译: 耦合到存储器和处理器的经滤波的流缓冲器的方法和装置,并且用于从存储器预取数据。 滤波的流缓冲器包括高速缓存块存储区域和过滤器控制器。 滤波器控制器确定引用模式是否具有预定关系,如果是,则将流数据预取到高速缓存块存储区域中。 这样的流数据预取在向量处理计算机中特别有用,其中一旦处理器开始获取向量,可以基于过去提取的模式来预测未来提取的地址。 根据本发明的各个方面,滤波流缓冲器还包括历史表,与高速缓存块存储区相关联的有效性指示符,并指示哪些高速缓存块(如果有的话)是有效的。 根据本发明的另一方面,滤波流缓冲器控制随机存取存储器(RAM)芯片以将多个连续高速缓存块从RAM流入高速缓存块存储区域。 根据本发明的另一方面,流数据包括用于多个跨度高速缓存块的数据,其中这些跨越高速缓存块中的每一个对应于通过将第一地址相加的确定的地址, 第二个地址和第一个地址。 根据本发明的另一方面,处理器在存储器中产生数据字的三个地址,并且滤波器控制器确定在三个地址之间是否存在预定的关系,如果是,则将步进流数据预取到所述高速缓存块存储区域 。

    Processor with efficient work queuing
    59.
    发明授权
    Processor with efficient work queuing 有权
    处理器具有高效的工作排队

    公开(公告)号:US09465662B2

    公开(公告)日:2016-10-11

    申请号:US13274767

    申请日:2011-10-17

    IPC分类号: G06F9/46 G06F9/50 G06F9/48

    摘要: Work submitted to a co-processor enters through one of multiple input queues, used to provide various quality of service levels. In-memory linked-lists store work to be performed by a network services processor in response to lack of processing resources in the network services processor. The work is moved back from the in-memory inked-lists to the network services processor in response to availability of processing resources in the network services processor.

    摘要翻译: 提交给协处理器的工作通过多个输入队列之一进入,用于提供各种服务质量水平。 内存链接列表存储由网络服务处理器执行的工作,以响应网络服务处理器中的处理资源的缺乏。 响应于网络服务处理器中的处理资源的可用性,将工作从内存中墨迹列表移回到网络服务处理器。

    Multi-core interconnect in a network processor
    60.
    发明授权
    Multi-core interconnect in a network processor 有权
    网络处理器中的多核互连

    公开(公告)号:US09330002B2

    公开(公告)日:2016-05-03

    申请号:US13285629

    申请日:2011-10-31

    IPC分类号: G06F12/00 G06F12/08

    CPC分类号: G06F12/0813 G06F12/08

    摘要: A network processor includes multiple processor cores for processing packet data. In order to provide the processor cores with access to a memory subsystem, an interconnect circuit directs communications between the processor cores and the L2 Cache and other memory devices. The processor cores are divided into several groups, each group sharing an individual bus, and the L2 Cache is divided into a number of banks, each bank having access to a separate bus. The interconnect circuit processes requests to store and retrieve data from the processor cores across multiple buses, and processes responses to return data from the cache banks. As a result, the network processor provides high-bandwidth memory access for multiple processor cores.

    摘要翻译: 网络处理器包括用于处理分组数据的多个处理器核。 为了向处理器核提供对存储器子系统的访问,互连电路指导处理器核与L2 Cache和其他存储器件之间的通信。 处理器核心分为几组,每组共享一条单独的总线,二级缓存分为多个银行,每个银行都可以访问单独的总线。 互连电路处理在多个总线上存储和检索来自处理器核心的数据的请求,并处理从缓存存储器返回数据的响应。 因此,网络处理器为多个处理器内核提供高带宽存储器访问。