System and method to manage address translation requests
    1.
    发明授权
    System and method to manage address translation requests 有权
    管理地址转换请求的系统和方法

    公开(公告)号:US08301865B2

    公开(公告)日:2012-10-30

    申请号:US12493941

    申请日:2009-06-29

    IPC分类号: G06F12/00 G06F9/26 G06F9/34

    CPC分类号: G06F12/1027 G06F2212/684

    摘要: A system and method for servicing translation lookaside buffer (TLB) misses may manage separate input and output pipelines within a memory management unit. A pending request queue (PRQ) in the input pipeline may include an instruction-related portion storing entries for instruction TLB (ITLB) misses and a data-related portion storing entries for potential or actual data TLB (DTLB) misses. A DTLB PRQ entry may be allocated to each load/store instruction selected from the pick queue. The system may select an ITLB- or DTLB-related entry for servicing dependent on prior PRQ entry selection(s). A corresponding entry may be held in a translation table entry return queue (TTERQ) in the output pipeline until a matching address translation is received from system memory. PRQ and/or TTERQ entries may be deallocated when a corresponding TLB miss is serviced. PRQ and/or TTERQ entries associated with a thread may be deallocated in response to a thread flush.

    摘要翻译: 用于服务翻译后备缓冲器(TLB)的系统和方法可以管理存储器管理单元内的单独的输入和输出管线。 输入流水线中的未决请求队列(PRQ)可以包括存储用于指令TLB(ITLB)未命中的条目的指令相关部分和存储潜在或实际数据TLB(DTLB)丢失的条目的数据相关部分。 可以将DTLB PRQ条目分配给从拾取队列中选择的每个加载/存储指令。 系统可以根据先前的PRQ条目选择来选择与ITLB或DTLB相关的条目进行服务。 相应的条目可以保存在输出流水线中的转换表条目返回队列(TTERQ)中,直到从系统存储器接收到匹配的地址转换。 当服务对应的TLB未命中时,PRQ和/或TTERQ条目可以被释放。 与线程相关联的PRQ和/或TTERQ条目可以响应于线程刷新而被释放。

    System and Method to Manage Address Translation Requests
    2.
    发明申请
    System and Method to Manage Address Translation Requests 有权
    管理地址转换请求的系统和方法

    公开(公告)号:US20100332787A1

    公开(公告)日:2010-12-30

    申请号:US12493941

    申请日:2009-06-29

    IPC分类号: G06F12/10 G06F12/00

    CPC分类号: G06F12/1027 G06F2212/684

    摘要: A system and method for servicing translation lookaside buffer (TLB) misses may manage separate input and output pipelines within a memory management unit. A pending request queue (PRQ) in the input pipeline may include an instruction-related portion storing entries for instruction TLB (ITLB) misses and a data-related portion storing entries for potential or actual data TLB (DTLB) misses. A DTLB PRQ entry may be allocated to each load/store instruction selected from the pick queue. The system may select an ITLB- or DTLB-related entry for servicing dependent on prior PRQ entry selection(s). A corresponding entry may be held in a translation table entry return queue (TTERQ) in the output pipeline until a matching address translation is received from system memory. PRQ and/or TTERQ entries may be deallocated when a corresponding TLB miss is serviced. PRQ and/or TTERQ entries associated with a thread may be deallocated in response to a thread flush.

    摘要翻译: 用于服务翻译后备缓冲器(TLB)的系统和方法可以管理存储器管理单元内的单独的输入和输出管线。 输入流水线中的未决请求队列(PRQ)可以包括存储用于指令TLB(ITLB)未命中的条目的指令相关部分和存储潜在或实际数据TLB(DTLB)丢失的条目的数据相关部分。 可以将DTLB PRQ条目分配给从拾取队列中选择的每个加载/存储指令。 系统可以根据先前的PRQ条目选择来选择与ITLB或DTLB相关的条目进行服务。 相应的条目可以保存在输出流水线中的转换表条目返回队列(TTERQ)中,直到从系统存储器接收到匹配的地址转换。 当服务对应的TLB未命中时,PRQ和/或TTERQ条目可以被释放。 与线程相关联的PRQ和/或TTERQ条目可以响应于线程刷新而被释放。

    System and method to invalidate obsolete address translations
    3.
    发明授权
    System and method to invalidate obsolete address translations 有权
    使过时地址转换无效的系统和方法

    公开(公告)号:US08412911B2

    公开(公告)日:2013-04-02

    申请号:US12493923

    申请日:2009-06-29

    IPC分类号: G06F12/00 G06F13/00 G06F13/28

    摘要: A system and method for invalidating obsolete virtual/real address to physical address translations may employ translation lookaside buffers to cache translations. TLB entries may be invalidated in response to changes in the virtual memory space, and thus may need to be demapped. A non-cacheable unit (NCU) residing on a processor may be configured to receive and manage a global TLB demap request from a thread executing on a core residing on the processor. The NCU may send the request to local cores and/or to NCUs of external processors in a multiprocessor system using a hardware instruction to broadcast to all cores and/or processors or to multicast to designated cores and/or processors. The NCU may track completion of the demap operation across the cores and/or processors using one or more counters, and may send an acknowledgement to the initiator of the demap request when the global demap request has been satisfied.

    摘要翻译: 将过时的虚拟/实际地址无效化到物理地址转换的系统和方法可以使用翻译后备缓冲器来缓存翻译。 TLB条目可以响应于虚拟存储器空间的变化而被无效,因此可能需要进行解映射。 驻留在处理器上的不可缓存单元(NCU)可以被配置为从驻留在处理器上的核上执行的线程接收和管理全局TLB解映射请求。 NCU可以使用硬件指令向多处理器系统中的本地核心和/或外部处理器的NCU发送请求,以广播到所有核心和/或处理器或者组播到指定的核心和/或处理器。 NCU可以跟踪使用一个或多个计数器的核心和/或处理器之间的去映射操作的完成,并且当满足全局解映射请求时,可以向解映射请求的发起者发送确认。

    System and Method to Invalidate Obsolete Address Translations
    4.
    发明申请
    System and Method to Invalidate Obsolete Address Translations 有权
    系统和方法使无效的地址翻译无效

    公开(公告)号:US20100332786A1

    公开(公告)日:2010-12-30

    申请号:US12493923

    申请日:2009-06-29

    IPC分类号: G06F12/10 G06F12/00 G06F9/34

    摘要: A system and method for invalidating obsolete virtual/real address to physical address translations may employ translation lookaside buffers to cache translations. TLB entries may be invalidated in response to changes in the virtual memory space, and thus may need to be demapped. A non-cacheable unit (NCU) residing on a processor may be configured to receive and manage a global TLB demap request from a thread executing on a core residing on the processor. The NCU may send the request to local cores and/or to NCUs of external processors in a multiprocessor system using a hardware instruction to broadcast to all cores and/or processors or to multicast to designated cores and/or processors. The NCU may track completion of the demap operation across the cores and/or processors using one or more counters, and may send an acknowledgement to the initiator of the demap request when the global demap request has been satisfied.

    摘要翻译: 将过时的虚拟/实际地址无效化到物理地址转换的系统和方法可以使用翻译后备缓冲器来缓存翻译。 TLB条目可以响应于虚拟存储器空间的变化而被无效,因此可能需要进行解映射。 驻留在处理器上的不可缓存单元(NCU)可以被配置为从驻留在处理器上的核上执行的线程接收和管理全局TLB解映射请求。 NCU可以使用硬件指令向多处理器系统中的本地核心和/或外部处理器的NCU发送请求,以广播到所有核心和/或处理器或者组播到指定的核心和/或处理器。 NCU可以跟踪使用一个或多个计数器的核心和/或处理器之间的去映射操作的完成,并且当满足全局解映射请求时,可以向解映射请求的发起者发送确认。

    Register access protocol in a multihreaded multi-core processor
    5.
    发明授权
    Register access protocol in a multihreaded multi-core processor 有权
    在多线程多核处理器中注册访问协议

    公开(公告)号:US07747771B1

    公开(公告)日:2010-06-29

    申请号:US10881178

    申请日:2004-06-30

    IPC分类号: G06F15/16 G06F15/76 G06F13/00

    CPC分类号: G06F15/16

    摘要: A method and mechanism for managing access to a plurality of registers in a processing device are contemplated. A processing device includes multiple nodes coupled to a ring bus, each of which include one or more registers which may be accessed by processes executing within the device. Also coupled to the ring bus is a ring control unit which is configured to initiate transactions targeted to nodes on the ring bus. Each of the nodes are configured receive and process bus transaction with a fixed latency whether or not the first transaction is targeted to the receiving node. The ring control unit is configured to periodically convey idle transactions on the ring bus in order to allow nodes responding to indeterminate transactions to gain access to the bus.

    摘要翻译: 考虑了用于管理对处理设备中的多个寄存器的访问的方法和机制。 处理设备包括耦合到环形总线的多个节点,每个节点包括一个或多个可由设备内执行的进程访问的寄存器。 还耦合到环形总线的环控制单元被配置为发起针对环形总线上的节点的事务。 每个节点被配置为具有固定延迟的接收和处理总线事务,无论第一个事务是否针对接收节点。 环控制单元被配置为周期性地传送环总线上的空闲事务,以便允许节点响应不确定的事务来访问总线。

    Branch misprediction recovery mechanism for microprocessors
    6.
    发明授权
    Branch misprediction recovery mechanism for microprocessors 有权
    微处理器分支错误预测恢复机制

    公开(公告)号:US08099586B2

    公开(公告)日:2012-01-17

    申请号:US12346349

    申请日:2008-12-30

    IPC分类号: G06F9/00

    CPC分类号: G06F9/3844 G06F9/3863

    摘要: A system and method for reducing branch misprediction penalty. In response to detecting a mispredicted branch instruction, circuitry within a microprocessor identifies a predetermined condition prior to retirement of the branch instruction. Upon identifying this condition, the entire corresponding pipeline is flushed prior to retirement of the branch instruction, and instruction fetch is started at a corresponding address of an oldest instruction in the pipeline immediately prior to the flushing of the pipeline. The correct outcome is stored prior to the pipeline flush. In order to distinguish the mispredicted branch from other instructions, identification information may be stored alongside the correct outcome. One example of the predetermined condition being satisfied is in response to a timer reaching a predetermined threshold value, wherein the timer begins incrementing in response to the mispredicted branch detection and resets at retirement of the mispredicted branch.

    摘要翻译: 减少分支误判处罚的系统和方法。 响应于检测到错误的分支指令,微处理器内的电路在退出分支指令之前识别预定的条件。 在识别该条件之后,在分支指令退出之前将整个对应的流水线冲洗,并且在冲洗流水线之前在管道中的最早的指令的对应地址开始指令提取。 在管道冲洗之前存储正确的结果。 为了将错误预测的分支与其他指令区分开,识别信息可以与正确的结果一起存储。 满足预定条件的一个示例是响应于定时器达到预定阈值,其中定时器响应于错误预测的分支检测而开始递增,并且在退出预测分支时重置。

    TLB tag parity checking without CAM read
    7.
    发明授权
    TLB tag parity checking without CAM read 有权
    没有CAM读取的TLB标签奇偶校验

    公开(公告)号:US07366829B1

    公开(公告)日:2008-04-29

    申请号:US10882806

    申请日:2004-06-30

    IPC分类号: G06F12/00 G06F13/00 G06F13/28

    CPC分类号: G06F12/1027 G06F11/1064

    摘要: An apparatus and method for expediting parity checked TLB access operations is described in connection with a multithreaded multiprocessor chip. This parity checking mechanism eliminates the need to read a CAM entry from a TLB during a TLB access by storing the tag parity value in a RAM portion of a TLB, using the CAM key input to generate a tag parity check value for a matched entry, and comparing the generated tag parity check value to the stored tag parity value to determine if there is a parity match or error.

    摘要翻译: 结合多线程多处理器芯片描述了用于加速奇偶校验TLB访问操作的装置和方法。 该奇偶校验机制消除了在TLB接入期间通过使用CAM密钥输入将标签奇偶校验值存储在TLB的RAM部分中来从TLB读取CAM条目的需要,以生成匹配条目的标签奇偶校验值, 以及将生成的标签奇偶校验值与所存储的标签奇偶校验值进行比较,以确定是否存在奇偶匹配或错误。

    LOAD-MONITOR MWAIT
    8.
    发明申请
    LOAD-MONITOR MWAIT 审中-公开

    公开(公告)号:US20140075163A1

    公开(公告)日:2014-03-13

    申请号:US13607175

    申请日:2012-09-07

    IPC分类号: G06F9/312

    摘要: Techniques are disclosed relating to suspending execution of a processor thread while monitoring for a write to a specified memory location. An execution subsystem may be configured to perform a load instruction that causes the processor to retrieve data from a specified memory location and atomically begin monitoring for a write to the specified location. The load instruction may be a load-monitor instruction. The execution subsystem may be further configured to perform a wait instruction that causes the processor to suspend execution of a processor thread during at least a portion of an interval specified by the wait instruction and to resume execution of the processor thread at the end of the interval. The wait instruction may be a monitor-wait instruction. The processor may be further configured to resume execution of the processor thread in response to detecting a write to a memory location specified by a previous monitor instruction.

    摘要翻译: 公开了关于在监视对指定的存储器位置的写入的情况下暂停执行处理器线程的技术。 执行子系统可以被配置为执行加载指令,其使处理器从指定的存储器位置检索数据,并且原子地开始监视对指定位置的写入。 加载指令可以是加载监视器指令。 执行子系统还可以被配置为执行等待指令,该等待指令使处理器在由等待指令指定的间隔的至少一部分期间暂停处理器线程的执行,并且在间隔结束时继续执行处理器线程 。 等待指令可以是监视等待指令。 响应于检测到对由先前监视指令指定的存储器位置的写入,处理器还可被配置为恢复处理器线程的执行。

    BRANCH MISPREDICTION RECOVERY MECHANISM FOR MICROPROCESSORS
    9.
    发明申请
    BRANCH MISPREDICTION RECOVERY MECHANISM FOR MICROPROCESSORS 有权
    用于微处理器的分支机构故障恢复机制

    公开(公告)号:US20100169611A1

    公开(公告)日:2010-07-01

    申请号:US12346349

    申请日:2008-12-30

    IPC分类号: G06F9/312

    CPC分类号: G06F9/3844 G06F9/3863

    摘要: A system and method for reducing branch misprediction penalty. In response to detecting a mispredicted branch instruction, circuitry within a microprocessor identifies a predetermined condition prior to retirement of the branch instruction. Upon identifying this condition, the entire corresponding pipeline is flushed prior to retirement of the branch instruction, and instruction fetch is started at a corresponding address of an oldest instruction in the pipeline immediately prior to the flushing of the pipeline. The correct outcome is stored prior to the pipeline flush. In order to distinguish the mispredicted branch from other instructions, identification information may be stored alongside the correct outcome. One example of the predetermined condition being satisfied is in response to a timer reaching a predetermined threshold value, wherein the timer begins incrementing in response to the mispredicted branch detection and resets at retirement of the mispredicted branch.

    摘要翻译: 减少分支误判处罚的系统和方法。 响应于检测到错误的分支指令,微处理器内的电路在退出分支指令之前识别预定的条件。 在识别该条件之后,在分支指令退出之前将整个对应的流水线冲洗,并且在冲洗流水线之前在管道中的最早的指令的对应地址开始指令提取。 在管道冲洗之前存储正确的结果。 为了将错误预测的分支与其他指令区分开,识别信息可以与正确的结果一起存储。 满足预定条件的一个示例是响应于定时器达到预定阈值,其中定时器响应于错误预测的分支检测而开始递增,并且在退出预测分支时重置。

    MITIGATION OF THREAD HOGS ON A THREADED PROCESSOR USING A GENERAL LOAD/STORE TIMEOUT COUNTER
    10.
    发明申请
    MITIGATION OF THREAD HOGS ON A THREADED PROCESSOR USING A GENERAL LOAD/STORE TIMEOUT COUNTER 审中-公开
    使用一般负载/存储超时计数器在螺纹加工器上减少螺纹头

    公开(公告)号:US20130297910A1

    公开(公告)日:2013-11-07

    申请号:US13463319

    申请日:2012-05-03

    IPC分类号: G06F9/30 G06F9/38

    摘要: Systems and methods for efficient thread arbitration in a threaded processor with dynamic resource allocation. A processor includes a resource shared by multiple threads. The resource includes entries which may be allocated for use by any thread. Control logic detects long latency instructions. Long latency instructions have a latency greater than a given threshold. One example is a load instruction that has a read-after-write (RAW) data dependency on a store instruction that misses a last-level data cache. The long latency instruction or an immediately younger instruction is selected for replay for an associated thread. A pipeline flush and replay for the associated thread begins with the selected instruction. Instructions younger than the long latency instruction are held at a given pipeline stage until the long latency instruction completes. During replay, this hold prevents resources from being allocated to the associated thread while the long latency instruction is being serviced.

    摘要翻译: 在具有动态资源分配的线程处理器中有效的线程仲裁的系统和方法。 处理器包括由多个线程共享的资源。 资源包括可以分配给任何线程使用的条目。 控制逻辑检测长延迟指令。 长延迟指令的延迟大于给定的阈值。 一个示例是对于丢失最后一级数据高速缓存的存储指令具有对后读写(RAW)数据依赖性的加载指令。 选择长延迟指令或立即更年轻的指令用于相关线程的重放。 相关线程的流水线冲洗和重播将以所选指令开始。 比长延迟指令更年轻的指令保持在给定的流水线阶段,直到长延迟指令完成。 在重放期间,这种保持可以防止资源被分配给相关联的线程,而长时间延迟指令被服务。