摘要:
A processor architecture containing multiple closely coupled processors in a form of symmetric multiprocessing system is provided. The special coupling mechanism allows it to speculatively execute multiple threads in parallel very efficiently. Generally, the operating system is responsible for scheduling various threads of execution among the available processors in a multiprocessor system. One problem with parallel multithreading is that the overhead involved in scheduling the threads for execution by the operating system is such that shorter segments of code cannot efficiently take advantage of parallel multithreading. Consequently, potential performance gains from parallel multithreading are not attainable. Additional circuitry is included in a form of symmetrical multiprocessing system which enables the scheduling and speculative execution of multiple threads on multiple processors without the involvement and inherent overhead of the operating system. Advantageously, parallel multithreaded execution is more efficient and performance may be improved.
摘要:
A system and method for transparent handling of extended register states. A set of additional registers, or an extended register file, is added to the base architecture of a microprocessor. The extended register file includes two dedicated registers and a plurality of general-use registers. The extended register file is mapped to a region in main memory. One dedicated register of the extended register file stores the physical base address of the memory region. Another dedicated register of the extended register file is used to store bits to indicate the status of the extended register file. A set of extended instructions is implemented for transferring data to and from the extended register file.
摘要:
A processor for executing computer instructions including, in one embodiment, a machine specific register (MSR) which includes a predicated execution field and an instruction decoder. The decoder is coupled to the MSR and configured to detect predicated execution information contained in the computer instruction and to include conditional execution information in the decoded instruction upon detecting an appropriate setting in the predicated execution field of the MSR. The processor further includes a first execution unit. The first execution unit is configured to detect and evaluate the conditional execution information in the decoded instruction and, if present, to execute the decoded instruction only if a condition represented by the conditional execution information is true. In another embodiment, the processor includes a standard register set and an extended register set, which includes the standard register set. The decoder is configured to search the computer instruction for an extended register indicator upon detecting an appropriate setting in the extended register field of the MSR. The decoder is further configured to fetch, upon detecting the extended register indicator, a value from a selected register within the extended register set. If the decoder detects the absence of extended register indicator, a value is fetched from a selected register where the selected register is within the standard register set. In another embodiment, the MSR includes a three register field and the decoder is configured to interpret the computer instruction as containing first and second source register operands and a destination operand if the instruction contains a three register indicator and the three register field is set appropriately.
摘要:
A system and method for generating images of three-dimensional objects. The system includes one or more tracing processors, and one or more shading processors. Each of the tracing processors may be configured to (a) perform a first set of computations on a corresponding group of primary rays emanating from a viewpoint resulting in a ray tree and a set of one or more light trees for each primary ray of the corresponding group, (b) transfer the ray trees and associated light trees to one of the shading processors, and (c) repeat (a) and (b). Each of the shading processors may be configured to (d) receive ray trees and associated light trees from one of the tracing processors, (e) perform a second set of computations on the received ray trees and associated light trees to determine pixel color values, and (f) repeat (d) and (e) a plurality of times.
摘要:
A mechanism for exception and interrupt handling in multithreaded multiprocessors is provided. The mechanism allows the handling of exceptions and interruptions in a multithreaded multiprocessor computer, while hiding the multiprocessor nature of the computer from the operating system. Generally, when an operating system is cognizant of the multiprocessor nature of a computer, additional overhead may be required when handling exceptions and interruptions. Due to the overhead involved in saving and restoring processing states, the performance of a processor may be significantly impacted. Additional circuitry is provided which allows the multiprocessor nature of the computer to be hidden from the operating system, while minimizing the overhead necessary for proper handling.
摘要:
A microprocessor having a standard register set and an extended register set, which is configured to save its state upon suspension of either an extended register process or a standard register processor. The microprocessor is configured to execute both standard register instruction sequences and extended register instruction sequences. A first memory is provided for storing a state of the microprocessor when a standard register instruction set sequence is suspended. The microprocessor further comprises a second memory for storing a microprocessor state upon suspension of the microprocessor executing an extended register instruction set sequence. An extended state save circuit coupled between a microprocessor core and the second memory allows the extended state of the microprocessor to be stored without modification of the operating system. As a result, the extended state of the microprocessor can be saved transparently to the operating system.
摘要:
A cache memory configured to access stored instructions according to basic blocks is disclosed. Basic blocks are natural divisions in instruction streams resulting from branch instructions. The start of a basic block is a target of a branch, and the end is another branch instruction. A microprocessor configured to use a basic block oriented cache may comprise a basic block cache and a basic block sequence buffer. The basic block cache may have a plurality of storage locations configured to store basic blocks. The basic block sequence buffer also has a plurality of storage locations, each configured to store a block sequence entry. The block sequence entry may comprise an address tag and one or more basic block pointers. The address tag corresponds to the fetch address of a particular basic block, and the pointers point to basic blocks that follow the particular basic block in a predicted order. A system using the microprocessor and a method for caching instructions in a block oriented manner rather than conventional power-of-two memory blocks are also disclosed.
摘要:
A method and apparatus are disclosed for implementing early release of speculatively read data in a hardware transactional memory system. A processing core comprises a hardware transactional memory system configured to receive an early release indication for a specified word of a group of words in a read set of an active transaction. The early release indication comprises a request to remove the specified word from the read set. In response to the early release request, the processing core removes the group of words from the read set only after determining that no word in the group other than the specified word has been speculatively read during the active transaction.
摘要:
A method and apparatus for compiling software written to be executed on a microprocessor that supports at least one hardware transactional memory function is provided. A compiler that supports at least one software transactional memory function is adapted to include a runtime system that maps between the at least one software transactional memory function and the at least one hardware transactional memory instruction.
摘要:
An apparatus, method, and system for implementing a hardware transactional memory (HTM) system with multiple levels of transactional buffers. The apparatus comprises a data cache configured to buffer data in a shared (by a plurality of processing cores) memory accessed by speculative memory access operations and to retain the data during at least a portion of an attempt to execute the atomic memory transaction. The apparatus also comprises an overflow detection circuit configured to detect an overflow condition upon determining that the data cache has insufficient capacity to buffer a portion of data accessed as part of the atomic memory transaction, as well as a buffering circuit configured to respond to the detection of the overflow condition by preventing the portion of data from being buffered in the data cache and buffering the portion of data in a secondary buffer separate from the data cache.