Microprocessor having improved memory management unit and cache memory
    1.
    发明授权
    Microprocessor having improved memory management unit and cache memory 有权
    具有改进的存储器管理单元和高速缓冲存储器的微处理器

    公开(公告)号:US06553460B1

    公开(公告)日:2003-04-22

    申请号:US09410505

    申请日:1999-10-01

    IPC分类号: G06F1200

    摘要: Methods of managing a cache memory system in a data processing system are disclosed. The data processing system executes instructions and stores and receives data from a memory having locations in a memory space. The entries of the cache memory are in locations in a register space separate from the memory space. A first instruction that operates only on locations in a register space but not on locations in memory space may be executed to obtain address information from at least one entry of the cache memory. The obtained address information be compared with target address information. If the comparison between the obtained address information and the target address information results in a correspondence, then a first operation may be performed on the entry of the cache memory. If the comparison between the obtained address information and the target address information does not result in a correspondence, then the fit first operations not performed on the entry of the cache memory. Management operations may thus be performed on the cache memory without using locations in memory space. The first operation may include invalidate, flush or purge operations. The cache memory may be a virtual cache memory that has a plurality of entries each including physical address information and logical address information. The obtained address information, may be logical address information or physical address information. The first instruction may be a GET instruction for reading information from entries of the translation lookaside buffer or the cache memory. The second instruction may be a PUT instruction for writing information to entries of the translation lookaside buffer or the cache memory.

    摘要翻译: 公开了一种在数据处理系统中管理高速缓冲存储器系统的方法。 数据处理系统执行指令并存储并从存储器空间中具有位置的存储器接收数据。 高速缓冲存储器的条目位于与存储器空间分开的寄存器空间中的位置。 可以执行仅在寄存器空间中的位置而不在存储器空间中的位置上操作的第一指令,以从高速缓冲存储器的至少一个条目获得地址信息。 将获得的地址信息与目标地址信息进行比较。 如果获得的地址信息和目标地址信息之间的比较导致对应关系,则可以对高速缓冲存储器的条目执行第一操作。 如果所获得的地址信息与目标地址信息之间的比较不产生对应关系,则对高速缓冲存储器的条目不进行适合的第一操作。 因此,可以在高速缓冲存储器上执行管理操作,而不使用存储空间中的位置。 第一个操作可能包括无效,冲洗或清除操作。 高速缓冲存储器可以是具有多个条目的虚拟高速缓冲存储器,每个条目包括物理地址信息和逻辑地址信息。 所获得的地址信息可以是逻辑地址信息或物理地址信息。 第一指令可以是用于从翻译后备缓冲器或高速缓冲存储器的条目读取信息的GET指令。 第二指令可以是用于将信息写入到翻译后备缓冲器或高速缓冲存储器的条目的PUT指令。

    Microprocessor having improved memory management unit and cache memory

    公开(公告)号:US06591340B2

    公开(公告)日:2003-07-08

    申请号:US10166503

    申请日:2002-06-10

    IPC分类号: G06F1200

    摘要: Methods of widening the permission for a memory access in a data processing system having a virtual cache memory and a translation lookaside buffer are disclosed. A memory access operation is initiated on a predetermined memory location based on logical address information and permission information associated with the memory access operation. The virtual cache memory is accessed and a determination may be made if there is a match between logical address information of the memory access operation and logical address information stored in the entries of the virtual cache. In the event of a match, then a determination may be made based on the permission information of the memory access operation and the permission information of the particular entry of the virtual cache memory as to whether the memory access operation is permitted. If the memory access operation is not permitted by the permission information of the particular entry of the virtual cache memory, then the translation lookaside buffer may be accessed based on the logical address information of the particular entry of the virtual cache memory. If there is a match between the logical address information of the particular entry of the virtual cache memory and the logical address information of a particular entry of the translation lookaside buffer, then a determination may be made based on the permission information of the memory access operation and the permission information of the particular entry of the translation lookaside buffer as to whether the memory access operation is permitted by the permission information of the particular entry of the translation lookaside buffer. If the memory access operation is permitted by the permission information of the particular entry of the translation lookaside buffer, then the permission information of the particular entry of the virtual cache memory may be updated based on the permission information of the particular entry of the translation lookaside buffer and the memory access operation may be completed.

    Microprocessor having improved memory management unit and cache memory

    公开(公告)号:US06412043B1

    公开(公告)日:2002-06-25

    申请号:US09410506

    申请日:1999-10-01

    IPC分类号: G06F1200

    摘要: Methods of widening the permission for a memory access in a data processing system having a virtual cache memory and a translation lookaside buffer are disclosed. A memory access operation is initiated on a predetermined memory location based on logical address information and permission information associated with the memory access operation. The virtual cache memory is accessed and a determination may be made if there is a match between logical address information of the memory access operation and logical address information stored in the entries of the virtual cache. In the event of a match, then a determination may be made based on the permission information of the memory access operation and the permission information of the particular entry of the virtual cache memory as to whether the memory access operation is permitted. If the memory access operation is not permitted by the permission information of the particular entry of the virtual cache memory, then the translation lookaside buffer may be accessed based on the logical address information of the particular entry of the virtual cache memory. If there is a match between the logical address information of the particular entry of the virtual cache memory and the logical address information of a particular entry of the translation lookaside buffer, then a determination may be made based on the permission information of the memory access operation and the permission information of the particular entry of the translation lookaside buffer as to whether the memory access operation is permitted by the permission information of the particular entry of the translation lookaside buffer. If the memory access operation is permitted by the permission information of the particular entry of the translation lookaside buffer, then the permission information of the particular entry of the virtual cache memory may be updated based on the permission information of the particular entry of the translation lookaside buffer and the memory access operation may be completed.

    Optimizing hardware TLB reload performance in a highly-threaded processor with multiple page sizes
    4.
    发明授权
    Optimizing hardware TLB reload performance in a highly-threaded processor with multiple page sizes 有权
    在具有多页尺寸的高线程处理器中优化硬件TLB重新载入性能

    公开(公告)号:US07543132B1

    公开(公告)日:2009-06-02

    申请号:US10880985

    申请日:2004-06-30

    IPC分类号: G06F12/10

    摘要: A method and apparatus for improved performance for reloading translation look-aside buffers in multithreading, multi-core processors. TSB prediction is accomplished by hashing a plurality of data parameters and generating an index that is provided as an input to a predictor array to predict the TSB page size. In one embodiment of the invention, the predictor array comprises two-bit saturating up-down counters that are used to enhance the accuracy of the TSB prediction. The saturating up-down counters are configured to avoid making rapid changes in the TSB prediction upon detection of an error. Multiple misses occur before the prediction output is changed. The page size specified by the predictor index is searched first. Using the technique described herein, errors are minimized because the counter leads to the correct result at least half the time.

    摘要翻译: 一种用于在多线程,多核处理器中重新加载翻译后备缓冲器的性能的方法和装置。 通过散列多个数据参数并生成作为预测器阵列的输入提供的索引来预测TSB页面大小来实现TSB预测。 在本发明的一个实施例中,预测器阵列包括用于增强TSB预测精度的二位饱和上拉计数器。 饱和上拉计数器配置为避免在检测到错误时对TSB预测进行快速更改。 在预测输出改变之前会发生多重错误。 首先搜索由预测变量索引指定的页面大小。 使用本文描述的技术,误差被最小化,因为计数器至少在一半的时间内导致正确的结果。

    Processor architecture for executing two different fixed-length instruction sets
    5.
    发明申请
    Processor architecture for executing two different fixed-length instruction sets 审中-公开
    用于执行两个不同固定长度指令集的处理器架构

    公开(公告)号:US20050262329A1

    公开(公告)日:2005-11-24

    申请号:US10644226

    申请日:2003-08-19

    IPC分类号: G06F9/30 G06F9/318 G06F9/32

    CPC分类号: G06F9/30174

    摘要: A processor element, structured to execute a 32-bit fixed length instruction set architecture, is backward compatible with a 16-bit fixed length instruction set architecture by translating each of the 16-bit instructions into a sequence of one or more 32-bit instructions. Switching between 16-bit instruction execution and 32-bit instruction execution is accomplished by branch instructions that employ a least significant bit position of the address of the target of the branch to identify whether the target instruction is a 16-bit instruction or a 32-bit instruction.

    摘要翻译: 构造为执行32位固定长度指令集架构的处理器元件通过将16位指令中的每一个转换成一个或多个32位指令的序列而与16位固定长度指令集架构向后兼容 。 在16位指令执行和32位指令执行之间进行切换是通过使用分支目标地址的最低有效位位置的分支指令来实现的,以识别目标指令是16位指令还是32位指令, 位指令。

    System, method and apparatus for improving the performance of collective operations in high performance computing
    6.
    发明授权
    System, method and apparatus for improving the performance of collective operations in high performance computing 有权
    用于提高高性能计算中集体操作性能的系统,方法和装置

    公开(公告)号:US09391845B2

    公开(公告)日:2016-07-12

    申请号:US14495190

    申请日:2014-09-24

    摘要: System, method, and apparatus for improving the performance of collective operations in High Performance Computing (HPC). Compute nodes in a networked HPC environment form collective groups to perform collective operations. A spanning tree is formed including the compute nodes and switches and links used to interconnect the compute nodes, wherein the spanning tree is configured such that there is only a single route between any pair of nodes in the tree. The compute nodes implement processes for performing the collective operations, which includes exchanging messages between processes executing on other compute nodes, wherein the messages contain indicia identifying collective operations they belong to. Each switch is configured to implement message forwarding operations for its portion of the spanning tree. Each of the nodes in the spanning tree implements a ratcheted cyclical state machine that is used for synchronizing collective operations, along with status messages that are exchanged between nodes. Transaction IDs are also used to detect out-of-order and lost messages.

    摘要翻译: 用于提高高性能计算(HPC)中集体操作性能的系统,方法和设备。 计算网络HPC环境中的节点形成集体组,以执行集体操作。 形成生成树,包括计算节点和用于互连计算节点的交换机和链路,其中生成树被配置为使得在树中的任何一对节点之间仅存在单个路由。 计算节点实现用于执行集合操作的过程,其包括在其他计算节点上执行的进程之间交换消息,其中消息包含标识其属于的集合操作的标记。 每个交换机都配置为对其生成树的部分实现消息转发操作。 生成树中的每个节点都实现了一个棘轮循环状态机,用于同步集体操作,以及在节点之间交换的状态消息。 事务ID也用于检测无序和丢失的消息。

    SYSTEM, METHOD AND APPARATUS FOR IMPROVING THE PERFORMANCE OF COLLECTIVE OPERATIONS IN HIGH PERFORMANCE COMPUTING
    7.
    发明申请
    SYSTEM, METHOD AND APPARATUS FOR IMPROVING THE PERFORMANCE OF COLLECTIVE OPERATIONS IN HIGH PERFORMANCE COMPUTING 有权
    用于提高高性能计算中集体操作性能的系统,方法和装置

    公开(公告)号:US20160087848A1

    公开(公告)日:2016-03-24

    申请号:US14495190

    申请日:2014-09-24

    IPC分类号: H04L12/24 H04L12/751

    摘要: System, method, and apparatus for improving the performance of collective operations in High Performance Computing (HPC). Compute nodes in a networked HPC environment form collective groups to perform collective operations. A spanning tree is formed including the compute nodes and switches and links used to interconnect the compute nodes, wherein the spanning tree is configured such that there is only a single route between any pair of nodes in the tree. The compute nodes implement processes for performing the collective operations, which includes exchanging messages between processes executing on other compute nodes, wherein the messages contain indicia identifying collective operations they belong to. Each switch is configured to implement message forwarding operations for its portion of the spanning tree. Each of the nodes in the spanning tree implements a ratcheted cyclical state machine that is used for synchronizing collective operations, along with status messages that are exchanged between nodes. Transaction IDs are also used to detect out-of-order and lost messages.

    摘要翻译: 用于提高高性能计算(HPC)中集体操作性能的系统,方法和设备。 计算网络HPC环境中的节点形成集体组,以执行集体操作。 形成生成树,包括计算节点和用于互连计算节点的交换机和链路,其中生成树被配置为使得在树中的任何一对节点之间仅存在单个路由。 计算节点实现用于执行集合操作的过程,其包括在其他计算节点上执行的进程之间交换消息,其中消息包含标识其属于的集合操作的标记。 每个交换机都配置为对其生成树的部分实现消息转发操作。 生成树中的每个节点都实现了一个棘轮循环状态机,用于同步集体操作,以及在节点之间交换的状态消息。 事务ID也用于检测无序和丢失的消息。