摘要:
Method for producing a predetermined length page memory pointer record, according to a selected page size and a selected page address, the method including the procedures of: determining a dynamic location of a separator bit within the page memory pointer record, according to the selected page size and an initial page size, the initial page size being respective of the smallest page size in a given memory system, writing a predetermined value to the dynamic location, writing a sequence of values opposite to the predetermined value to selected page size bits of the page memory pointer record, when the selected page size is different than the initial page size, and writing the selected page address to selected page address bits of the page memory pointer record.
摘要:
An apparatus is described having multiple cores, each core having: a) a CPU; b) an accelerator; and, c) a controller and a plurality of order buffers coupled between the CPU and the accelerator. Each of the order buffers is dedicated to a different one of the CPU's threads. Each one of the order buffers is to hold one or more requests issued to the accelerator from its corresponding thread. The controller is to control issuance of the order buffers' respective requests to the accelerator.
摘要:
A computer system may support one or more techniques to allow dynamic pinning of the memory pages accessed by a non-CPU device (e.g., a graphics processing unit, GPU). The non-CPU may support virtual to physical address mapping and may thus be aware of the memory pages, which may not be pinned but may be accessed by the non-CPU. The non-CPU may notify or send such information to a run-time component such as a device driver associated with the CPU. In one embodiment, the device driver may, dynamically, perform pinning of such memory pages, which may be accessed by the non-CPU. The device driver may even unpin the memory pages, which may be no longer accessed by the non-CPU. Such an approach may allow the memory pages, which may be no longer accessed by the non-CPU to be available for allocation to the other CPUs and/or non-CPUs.
摘要:
Some implementations disclosed herein provide techniques and arrangements for a synchronous software interface for a specialized logic engine. The synchronous software interface may receive, from a first core of a plurality of cores, a control block including a transaction for execution by the specialized logic engine. The synchronous software interface may send the control block to the specialized logic engine and wait to receive a confirmation from the specialized logic engine that the transaction was successfully executed.
摘要:
An asymmetric multiprocessor system (ASMP) may comprise computational cores implementing different instruction set architectures and having different power requirements. Program code for execution on the ASMP is analyzed and a determination is made as to whether to allow the program code, or a code segment thereof to execute on a first core natively or to use binary translation on the code and execute the translated code on a second core which consumes less power than the first core during execution.
摘要:
Page faults arising in a graphics processing unit may be handled by an operating system running on the central processing unit. In some embodiments, this means that unpinned memory can be used for the graphics processing unit. Using unpinned memory in the graphics processing unit may expand the capabilities of the graphics processing unit in some cases.
摘要:
In a front-end system for a processor, a recording scheme for instruction segments stores the instructions in reverse program order. Instruction segments may be traces, extended blocks or basic blocks. By storing the instructions in reverse program order, the instruction segment is easily extended to include additional instructions. The instruction segments may be extended without having to re-index tag arrays, pointers that associate instruction segments with other instruction segments.
摘要:
A method and apparatus for changing the configuration of a multi-core processor is disclosed. In one embodiment, a throttle module (or throttle logic) may determine the amount of parallelism present in the currently-executing program, and change the execution of the threads of that program on the various cores. If the amount of parallelism is high, then the processor may be configured to run a larger amount of threads on cores configured to consume less power. If the amount of parallelism is low, then the processor may be configured to run a smaller amount of threads on cores configured for greater scalar performance.
摘要:
A system and corresponding method use a PAUSE instruction as a low power hint in a single threaded or multithreaded environment using “processor slow mode.” One embodiment actually lowers the frequency of the processor clock. Another embodiment virtually lowers the frequency of the processor clock by gating M clock cycles out of every N clock cycles. When all threads have issued a PAUSE instruction, the processor enters slow mode and remains there for a while. After this while, the processor returns to normal mode. Alternatively, an event, such as an interrupt or an exception, can cause the processor to return to normal mode from slow mode.
摘要:
A power aware front-end unit for a processor may include a UOP cache that disables other circuitry within the front-end unit. In an embodiment, a front-end unit may disable instruction synchronization circuitry, instruction decode circuitry and, optionally, instruction fetch circuitry while instruction look-ups are underway in both a block cache and an instruction cache. If the instruction look-up indicates a miss, the disabled circuitry thereafter may be enabled.