Minimizing TLB Comparison Size
    1.
    发明申请
    Minimizing TLB Comparison Size 有权
    最小化TLB比较尺寸

    公开(公告)号:US20090327646A1

    公开(公告)日:2009-12-31

    申请号:US12112150

    申请日:2008-04-30

    IPC分类号: G06F12/10 G06F12/00

    摘要: In one embodiment, a system comprises one or more registers configured to store a plurality of values that identify a virtual address space (collectively a tag), a translation lookaside buffer (TLB), and a control unit coupled to the TLB and the one or more registers. The control unit is configured to detect whether or not the tag has changed and in response to a change in the tag, map the changed tag to an identifier having fewer bits than the total number of bits in the tag, and provide the current identifier to the TLB. The TLB is configured to detect a hit/miss in response to the identifier. A similar method is also contemplated.

    摘要翻译: 在一个实施例中,系统包括被配置为存储识别虚拟地址空间(统称为标签)的多个值,翻译后备缓冲器(TLB)以及耦合到TLB的控制单元的一个或多个寄存器, 更多的寄存器。 控制单元被配置为检测标签是否已经改变并且响应于标签的变化,将改变的标签映射到具有比标签中的总位数少的位的标识符,并将当前标识符提供给 TLB。 TLB被配置为响应于标识符来检测命中/未命中。 也可以考虑类似的方法。

    Minimizing TLB comparison size
    2.
    发明授权
    Minimizing TLB comparison size 有权
    最小化TLB比较大小

    公开(公告)号:US07937556B2

    公开(公告)日:2011-05-03

    申请号:US12112150

    申请日:2008-04-30

    IPC分类号: G06F12/10

    摘要: In one embodiment, a system comprises one or more registers configured to store a plurality of values that identify a virtual address space (collectively a tag), a translation lookaside buffer (TLB), and a control unit coupled to the TLB and the one or more registers. The control unit is configured to detect whether or not the tag has changed and in response to a change in the tag, map the changed tag to an identifier having fewer bits than the total number of bits in the tag, and provide the current identifier to the TLB. The TLB is configured to detect a hit/miss in response to the identifier. A similar method is also contemplated.

    摘要翻译: 在一个实施例中,系统包括一个或多个寄存器,其被配置为存储识别虚拟地址空间(统称为标签)的多个值,翻译后备缓冲器(TLB)以及耦合到TLB的控制单元, 更多的寄存器。 控制单元被配置为检测标签是否已经改变并且响应于标签的变化,将改变的标签映射到具有比标签中的总位数少的位的标识符,并将当前标识符提供给 TLB。 TLB被配置为响应于标识符来检测命中/未命中。 也可以考虑类似的方法。

    Multithreaded processor having a source processor core to subsequently delay continued processing of demap operation until responses are received from each of remaining processor cores
    3.
    发明授权
    Multithreaded processor having a source processor core to subsequently delay continued processing of demap operation until responses are received from each of remaining processor cores 有权
    具有源处理器核心的多线程处理器随后延迟解映射操作的持续处理,直到从每个剩余处理器核心接收到响应

    公开(公告)号:US07454590B2

    公开(公告)日:2008-11-18

    申请号:US11222614

    申请日:2005-09-09

    IPC分类号: G06F12/08

    摘要: In one embodiment, a processor comprises a plurality of processor cores and an interconnect to which the plurality of processor cores are coupled. Each of the plurality of processor cores comprises at least one translation lookaside buffer (TLB). A first processor core is configured to broadcast a demap command on the interconnect responsive to executing a demap operation. The demap command identifies one or more translations to be invalidated in the TLBs, and remaining processor cores are configured to invalidate the translations in the respective TLBs. The remaining processor cores transmit a response to the first processor core, and the first processor core is configured to delay continued processing subsequent to the demap operation until the responses are received from each of the remaining processor cores.

    摘要翻译: 在一个实施例中,处理器包括多个处理器核和多个处理器核耦合到的互连。 多个处理器核心中的每一个包括至少一个平移后备缓冲器(TLB)。 第一处理器核心被配置为响应于执行解映射操作而在互连上广播解映射命令。 解映射命令标识在TLB中将被无效的一个或多个翻译,并且剩余的处理器核被配置为使相应TLB中的翻译无效。 剩余的处理器核心向第一处理器核心发送响应,并且第一处理器核心被配置为延迟解映射操作之后的持续处理,直到从每个其余处理器核心接收到响应。

    Hardware demapping of TLBs shared by multiple threads
    4.
    发明授权
    Hardware demapping of TLBs shared by multiple threads 有权
    由多个线程共享的TLB的硬件解映射

    公开(公告)号:US07383415B2

    公开(公告)日:2008-06-03

    申请号:US11222577

    申请日:2005-09-09

    IPC分类号: G06F12/08

    摘要: In one embodiment, a processor comprising at least one translation lookaside buffer (TLB) and a control unit coupled to the TLB. The control unit is configured to track whether or not at least one update to the TLB is pending for at least one of a plurality of strands. Each strand comprises hardware to support a different thread of a plurality of concurrently activateable threads in the processor. The strands share the TLB, and the control unit is configured to delay a demap operation issued from one of the estrands responsive to the pending update, if any.

    摘要翻译: 在一个实施例中,处理器包括至少一个翻译后备缓冲器(TLB)和耦合到该TLB的控制单元。 控制单元被配置为跟踪针对多个线段中的至少一个的至少一个对TLB的更新是否待决。 每条链包括用于支持处理器中多个可同时激活的线程的不同线程的硬件。 链路共享TLB,并且控制单元被配置为响应于待决更新(如果有的话)延迟从一个estrand发出的解映射操作。

    Method and apparatus for precisely identifying effective addresses associated with hardware events
    5.
    发明授权
    Method and apparatus for precisely identifying effective addresses associated with hardware events 有权
    用于精确识别与硬件事件相关的有效地址的方法和装置

    公开(公告)号:US07779238B2

    公开(公告)日:2010-08-17

    申请号:US11589492

    申请日:2006-10-30

    IPC分类号: G06F11/30

    摘要: A system and method for precisely identifying an instruction causing a performance-related event is disclosed. The instruction may be detected while in a pipeline stage of a microprocessor preceding a writeback stage and the microprocessor's architectural state may not be updated until after information identifying the instruction is captured. The instruction may be flushed from the pipeline, along with other instructions from the same thread. A hardware trap may be taken when the instruction is detected and/or when an event counter overflows or is within a given range of overflowing. A software trap handler may capture and/or log information identifying the instruction, such as one or more extended address elements, before returning control and initiating a retry of the instruction. The captured and/or logged information may be stored in an event space database usable by a data space profiler to identify performance bottlenecks in the application containing the instruction.

    摘要翻译: 公开了一种用于精确识别引起性能相关事件的指令的系统和方法。 在回写阶段之前的微处理器的流水线级中可以检测该指令,并且直到在识别出指令的信息被捕获之后才能更新微处理器的架构状态。 可以从管道中刷新指令,以及来自同一线程的其他指令。 当检测到指令和/或当事件计数器溢出或处于给定的溢出范围内时,可能会采取硬件陷阱。 软件陷阱处理程序可以在返回控制和重新启动指令之前捕获和/或记录标识指令的信息,例如一个或多个扩展地址元素。 捕获的和/或记录的信息可以存储在可由数据空间分析器使用的事件空间数据库中,以识别包含该指令的应用中的性能瓶颈。

    Real-time address trace generation
    6.
    发明授权
    Real-time address trace generation 有权
    实时地址跟踪生成

    公开(公告)号:US07454666B1

    公开(公告)日:2008-11-18

    申请号:US11102203

    申请日:2005-04-07

    IPC分类号: G06F11/00

    CPC分类号: G06F11/3636

    摘要: A method for tracing of instructions executed by a processor is provided which includes providing a type of instruction to be traced and tracing at least one instruction corresponding to the type of instruction. The method further includes storing data without stopping from the tracing into a memory until the memory is full.

    摘要翻译: 提供了一种用于跟踪由处理器执行的指令的方法,其包括提供要跟踪的指令的类型并且跟踪与指令类型相对应的至少一个指令。 该方法还包括在不停止从跟踪到存储器直到存储器已满的情况下存储数据。

    System and method to manage address translation requests
    7.
    发明授权
    System and method to manage address translation requests 有权
    管理地址转换请求的系统和方法

    公开(公告)号:US08301865B2

    公开(公告)日:2012-10-30

    申请号:US12493941

    申请日:2009-06-29

    IPC分类号: G06F12/00 G06F9/26 G06F9/34

    CPC分类号: G06F12/1027 G06F2212/684

    摘要: A system and method for servicing translation lookaside buffer (TLB) misses may manage separate input and output pipelines within a memory management unit. A pending request queue (PRQ) in the input pipeline may include an instruction-related portion storing entries for instruction TLB (ITLB) misses and a data-related portion storing entries for potential or actual data TLB (DTLB) misses. A DTLB PRQ entry may be allocated to each load/store instruction selected from the pick queue. The system may select an ITLB- or DTLB-related entry for servicing dependent on prior PRQ entry selection(s). A corresponding entry may be held in a translation table entry return queue (TTERQ) in the output pipeline until a matching address translation is received from system memory. PRQ and/or TTERQ entries may be deallocated when a corresponding TLB miss is serviced. PRQ and/or TTERQ entries associated with a thread may be deallocated in response to a thread flush.

    摘要翻译: 用于服务翻译后备缓冲器(TLB)的系统和方法可以管理存储器管理单元内的单独的输入和输出管线。 输入流水线中的未决请求队列(PRQ)可以包括存储用于指令TLB(ITLB)未命中的条目的指令相关部分和存储潜在或实际数据TLB(DTLB)丢失的条目的数据相关部分。 可以将DTLB PRQ条目分配给从拾取队列中选择的每个加载/存储指令。 系统可以根据先前的PRQ条目选择来选择与ITLB或DTLB相关的条目进行服务。 相应的条目可以保存在输出流水线中的转换表条目返回队列(TTERQ)中,直到从系统存储器接收到匹配的地址转换。 当服务对应的TLB未命中时,PRQ和/或TTERQ条目可以被释放。 与线程相关联的PRQ和/或TTERQ条目可以响应于线程刷新而被释放。

    Performance instrumentation in a fine grain multithreaded multicore processor
    8.
    发明授权
    Performance instrumentation in a fine grain multithreaded multicore processor 有权
    精细多线程多核处理器中的性能测试

    公开(公告)号:US07702887B1

    公开(公告)日:2010-04-20

    申请号:US10881032

    申请日:2004-06-30

    IPC分类号: G06F7/38

    摘要: A method and mechanism for monitoring events in a processing system. A performance monitoring mechanism includes is configured to store a count of events in an event counter. Periodically, the count stored in the event counter is updated to a new count. If the new count equals a predetermined value, an indication that the count equals the predetermined value is conveyed. If the new count does not equal the predetermined value, but is within a given epsilon of the predetermined value and the occurrence of a corresponding event is detected, an indication that the count equals the predetermined value is conveyed. The mechanism is further configured to suppress event counts which correspond to mis-speculations.

    摘要翻译: 一种用于监控处理系统中的事件的方法和机制。 性能监视机制包括被配置为在事件计数器中存储事件的计数。 定期将存储在事件计数器中的计数更新为新计数。 如果新计数等于预定值,则传达计数等于预定值的指示。 如果新计数不等于预定值,但是在预定值的给定ε内,并且检测到对应事件的发生,则传达计数等于预定值的指示。 该机构还被配置为抑制与错误猜测相对应的事件计数。

    System and method to invalidate obsolete address translations
    9.
    发明授权
    System and method to invalidate obsolete address translations 有权
    使过时地址转换无效的系统和方法

    公开(公告)号:US08412911B2

    公开(公告)日:2013-04-02

    申请号:US12493923

    申请日:2009-06-29

    IPC分类号: G06F12/00 G06F13/00 G06F13/28

    摘要: A system and method for invalidating obsolete virtual/real address to physical address translations may employ translation lookaside buffers to cache translations. TLB entries may be invalidated in response to changes in the virtual memory space, and thus may need to be demapped. A non-cacheable unit (NCU) residing on a processor may be configured to receive and manage a global TLB demap request from a thread executing on a core residing on the processor. The NCU may send the request to local cores and/or to NCUs of external processors in a multiprocessor system using a hardware instruction to broadcast to all cores and/or processors or to multicast to designated cores and/or processors. The NCU may track completion of the demap operation across the cores and/or processors using one or more counters, and may send an acknowledgement to the initiator of the demap request when the global demap request has been satisfied.

    摘要翻译: 将过时的虚拟/实际地址无效化到物理地址转换的系统和方法可以使用翻译后备缓冲器来缓存翻译。 TLB条目可以响应于虚拟存储器空间的变化而被无效,因此可能需要进行解映射。 驻留在处理器上的不可缓存单元(NCU)可以被配置为从驻留在处理器上的核上执行的线程接收和管理全局TLB解映射请求。 NCU可以使用硬件指令向多处理器系统中的本地核心和/或外部处理器的NCU发送请求,以广播到所有核心和/或处理器或者组播到指定的核心和/或处理器。 NCU可以跟踪使用一个或多个计数器的核心和/或处理器之间的去映射操作的完成,并且当满足全局解映射请求时,可以向解映射请求的发起者发送确认。

    APIC implementation for a highly-threaded x86 processor
    10.
    发明授权
    APIC implementation for a highly-threaded x86 processor 有权
    高性能x86处理器的APIC实现

    公开(公告)号:US08190864B1

    公开(公告)日:2012-05-29

    申请号:US11924491

    申请日:2007-10-25

    IPC分类号: G06F9/00

    CPC分类号: G06F9/4818

    摘要: Advanced programmable interrupt control for a multithreaded multicore processor that supports software compatible with x86 processors. Embodiments provide interrupt control for increased threads with minimal additional hardware by including in each processor core, a core advanced interrupt controller (core APIC) configured to determine a lowest priority thread of its corresponding processor core. Each core APIC reports its lowest priority thread level as a core priority to an input/output APIC. The I/O APIC routes interrupt requests to the core APIC with the lowest core priority. The selected core APIC then routes the interrupt request to the corresponding lowest priority thread. Each core APIC detects changes in priority levels of its corresponding processor core threads, and notifies the I/O APIC of any change to the corresponding core priority. Each core APIC may notify the I/O APIC as the core priority changes, or when the I/O APIC requests status from each core APIC.

    摘要翻译: 用于支持与x86处理器兼容的软件的多线程多核处理器的高级可编程中断控制。 实施例通过在每个处理器核心中包括被配置为确定其对应的处理器核心的最低优先级线程的核心高级中断控制器(核心APIC)来提供具有最小附加硬件的增加的线程的中断控制。 每个核心APIC报告其最低优先级线程级别作为输入/输出APIC的核心优先级。 I / O APIC将核心优先级最低的核心APIC路由中断请求。 所选的核心APIC然后将中断请求路由到相应的最低优先级线程。 每个核心APIC检测其对应的处理器核心线程的优先级别的变化,并将I / O APIC通知相应的核心优先级的任何更改。 当核心优先级改变时,或者当I / O APIC从每个核心APIC请求状态时,每个核心APIC可以通知I / O APIC。