摘要:
A method and apparatus are disclosed for allocating functional units in a multithreaded very large instruction word (VLIW) processor. The present invention combines the techniques of conventional very long instruction word architectures and conventional multithreaded architectures to reduce execution time within an individual program, as well as across a workload. The present invention utilizes instruction packet splitting to recover some efficiency lost with conventional multithreaded architectures. Instruction packet splitting allows an instruction bundle to be partially issued in one cycle, with the remainder of the bundle issued during a subsequent cycle. The allocation hardware assigns as many instructions from each packet as will fit on the available functional units, rather than allocating all instructions in an instruction packet at one time. Those instructions that cannot be allocated to a functional unit are retained in a ready-to-run register. On subsequent cycles, instruction packets in which all instructions have been issued to functional units are updated from their thread's instruction stream, while instruction packets with instructions that have been held are retained. The functional unit allocation logic can then assign instructions from the newly-loaded instruction packets as well as instructions that were not issued from the retained instruction packets.
摘要:
To support IP routing table longest-prefix matching, a (ternary content-addressable) memory is managed by assigning its locations to interleaved pages and gaps. Each page has zero or more locations assigned to it, where all entries in a page have the same prefix length. Each gap has zero or more empty locations assigned to it. The pages are organized by descending prefix length. Associated with each page is a free-list identifying empty locations in that page. An “invariant” rule may dictate that first and last page locations cannot be empty. Whenever an entry is deleted from the first or last location in a page, that location is shifted (i.e., reassigned) to the adjacent gap. The direction chosen to search for an empty location for inserting a new entry is based on a global measure of the relative fullness of memory regions above and below the new entry's page.
摘要:
A method and apparatus are disclosed for allocating functional units in a multithreaded very large instruction word (VLIW) processor. The present invention combines the techniques of conventional very long instruction word (VLIW) architectures and conventional multithreaded architectures to reduce execution time within an individual program, as well as across a workload. The present invention utilizes instruction packet splitting to recover some efficiency lost with conventional multithreaded architectures. Instruction packet splitting allows an instruction bundle to be partially issued in one cycle, with the remainder of the bundle issued during a subsequent cycle. There are times, however, when instruction packets cannot be split without violating the semantics of the instruction packet assembled by the compiler. A packet split identification bit is disclosed that allows hardware to efficiently determine when it is permissible to split an instruction packet. The split bit informs the hardware when splitting is prohibited. The allocation hardware assigns as many instructions from each packet as will fit on the available functional units, rather than allocating all instructions in an instruction packet at one time, provided the split bit has not been set. Those instructions that cannot be allocated to a functional units are retained in a ready-to-run register. On subsequent cycles, instruction packets in which all instructions have been issued to functional units are updated from their thread's instruction stream, while instruction packets with instructions that have been held are retained. The functional unit allocation logic can then assign instructions from the newly-loaded instruction packets as well as instructions that were not issued from the retained instruction packets.