摘要:
In one embodiment, a system comprises one or more registers configured to store a plurality of values that identify a virtual address space (collectively a tag), a translation lookaside buffer (TLB), and a control unit coupled to the TLB and the one or more registers. The control unit is configured to detect whether or not the tag has changed and in response to a change in the tag, map the changed tag to an identifier having fewer bits than the total number of bits in the tag, and provide the current identifier to the TLB. The TLB is configured to detect a hit/miss in response to the identifier. A similar method is also contemplated.
摘要:
In one embodiment, a system comprises one or more registers configured to store a plurality of values that identify a virtual address space (collectively a tag), a translation lookaside buffer (TLB), and a control unit coupled to the TLB and the one or more registers. The control unit is configured to detect whether or not the tag has changed and in response to a change in the tag, map the changed tag to an identifier having fewer bits than the total number of bits in the tag, and provide the current identifier to the TLB. The TLB is configured to detect a hit/miss in response to the identifier. A similar method is also contemplated.
摘要:
In one embodiment, a processor comprises a plurality of processor cores and an interconnect to which the plurality of processor cores are coupled. Each of the plurality of processor cores comprises at least one translation lookaside buffer (TLB). A first processor core is configured to broadcast a demap command on the interconnect responsive to executing a demap operation. The demap command identifies one or more translations to be invalidated in the TLBs, and remaining processor cores are configured to invalidate the translations in the respective TLBs. The remaining processor cores transmit a response to the first processor core, and the first processor core is configured to delay continued processing subsequent to the demap operation until the responses are received from each of the remaining processor cores.
摘要:
In one embodiment, a processor comprising at least one translation lookaside buffer (TLB) and a control unit coupled to the TLB. The control unit is configured to track whether or not at least one update to the TLB is pending for at least one of a plurality of strands. Each strand comprises hardware to support a different thread of a plurality of concurrently activateable threads in the processor. The strands share the TLB, and the control unit is configured to delay a demap operation issued from one of the estrands responsive to the pending update, if any.
摘要:
A system and method for precisely identifying an instruction causing a performance-related event is disclosed. The instruction may be detected while in a pipeline stage of a microprocessor preceding a writeback stage and the microprocessor's architectural state may not be updated until after information identifying the instruction is captured. The instruction may be flushed from the pipeline, along with other instructions from the same thread. A hardware trap may be taken when the instruction is detected and/or when an event counter overflows or is within a given range of overflowing. A software trap handler may capture and/or log information identifying the instruction, such as one or more extended address elements, before returning control and initiating a retry of the instruction. The captured and/or logged information may be stored in an event space database usable by a data space profiler to identify performance bottlenecks in the application containing the instruction.
摘要:
A method for tracing of instructions executed by a processor is provided which includes providing a type of instruction to be traced and tracing at least one instruction corresponding to the type of instruction. The method further includes storing data without stopping from the tracing into a memory until the memory is full.
摘要:
A system and method for servicing translation lookaside buffer (TLB) misses may manage separate input and output pipelines within a memory management unit. A pending request queue (PRQ) in the input pipeline may include an instruction-related portion storing entries for instruction TLB (ITLB) misses and a data-related portion storing entries for potential or actual data TLB (DTLB) misses. A DTLB PRQ entry may be allocated to each load/store instruction selected from the pick queue. The system may select an ITLB- or DTLB-related entry for servicing dependent on prior PRQ entry selection(s). A corresponding entry may be held in a translation table entry return queue (TTERQ) in the output pipeline until a matching address translation is received from system memory. PRQ and/or TTERQ entries may be deallocated when a corresponding TLB miss is serviced. PRQ and/or TTERQ entries associated with a thread may be deallocated in response to a thread flush.
摘要:
A method and mechanism for monitoring events in a processing system. A performance monitoring mechanism includes is configured to store a count of events in an event counter. Periodically, the count stored in the event counter is updated to a new count. If the new count equals a predetermined value, an indication that the count equals the predetermined value is conveyed. If the new count does not equal the predetermined value, but is within a given epsilon of the predetermined value and the occurrence of a corresponding event is detected, an indication that the count equals the predetermined value is conveyed. The mechanism is further configured to suppress event counts which correspond to mis-speculations.
摘要:
A system and method for invalidating obsolete virtual/real address to physical address translations may employ translation lookaside buffers to cache translations. TLB entries may be invalidated in response to changes in the virtual memory space, and thus may need to be demapped. A non-cacheable unit (NCU) residing on a processor may be configured to receive and manage a global TLB demap request from a thread executing on a core residing on the processor. The NCU may send the request to local cores and/or to NCUs of external processors in a multiprocessor system using a hardware instruction to broadcast to all cores and/or processors or to multicast to designated cores and/or processors. The NCU may track completion of the demap operation across the cores and/or processors using one or more counters, and may send an acknowledgement to the initiator of the demap request when the global demap request has been satisfied.
摘要:
Advanced programmable interrupt control for a multithreaded multicore processor that supports software compatible with x86 processors. Embodiments provide interrupt control for increased threads with minimal additional hardware by including in each processor core, a core advanced interrupt controller (core APIC) configured to determine a lowest priority thread of its corresponding processor core. Each core APIC reports its lowest priority thread level as a core priority to an input/output APIC. The I/O APIC routes interrupt requests to the core APIC with the lowest core priority. The selected core APIC then routes the interrupt request to the corresponding lowest priority thread. Each core APIC detects changes in priority levels of its corresponding processor core threads, and notifies the I/O APIC of any change to the corresponding core priority. Each core APIC may notify the I/O APIC as the core priority changes, or when the I/O APIC requests status from each core APIC.
摘要翻译:用于支持与x86处理器兼容的软件的多线程多核处理器的高级可编程中断控制。 实施例通过在每个处理器核心中包括被配置为确定其对应的处理器核心的最低优先级线程的核心高级中断控制器(核心APIC)来提供具有最小附加硬件的增加的线程的中断控制。 每个核心APIC报告其最低优先级线程级别作为输入/输出APIC的核心优先级。 I / O APIC将核心优先级最低的核心APIC路由中断请求。 所选的核心APIC然后将中断请求路由到相应的最低优先级线程。 每个核心APIC检测其对应的处理器核心线程的优先级别的变化,并将I / O APIC通知相应的核心优先级的任何更改。 当核心优先级改变时,或者当I / O APIC从每个核心APIC请求状态时,每个核心APIC可以通知I / O APIC。