摘要:
One embodiment of the present invention sets forth a technique for placing texture barrier instructions within a thread program to advantageously enable efficient and correct operation of the thread program. A thread program compiler statically determines a pending request count needed to progress beyond a particular texture barrier instruction, which blocks execution of subsequent instructions that depend on previously requested data. Each instance of the thread program blocks execution at the barrier instruction until a pending request count condition is satisfied. This technique may advantageously reduce power consumption in a graphics processing unit by eliminating power consumption associated with conventional, generalized scoreboard resources.
摘要:
Methods and apparatuses for searching network data for one or more predetermined strings are disclosed. In one embodiment, the string search is a multi-stage search where the stages of the search are performed by different hardware components. In one embodiment in a first search stage, a first processor performs a comparison of blocks of incoming data to determine whether the blocks potentially represent the beginning of one of the predetermined strings. If a potential predetermined string is identified, a second processor performs a further search to determine whether the string matches one of the predetermined strings. Because the first processor searches only for the beginning of the predetermined strings, the first stage comparison can be performed quickly, which improves network performance as compared to more detailed searching. The second stage is performed by second processor, which allows the first processor to search for potential matching strings. Because many strings do not match the one or more predetermined strings, the more detailed search preformed by the second processor is performed selectively, which increases network performance as compared to more detailed searches on all network data.
摘要:
A method and apparatus for optimizing register allocation during scheduling and execution of program code in a hardware environment. The program code can be compiled to optimize execution given predetermined hardware constraints. The hardware constraints can include the number of register read and write operations that can be performed in a given processor pass. The optimizer can initially schedule the program using virtual registers and a goal of minimizing the amount of active registers at any time. The optimizer reschedules the program to assign the virtual registers to actual physical registers in a manner that minimizes the number of processor passes used to execute the program.
摘要:
An embodiment of a system, method and apparatus is disclosed. The system comprises a first processor that is in communication with a root existence table. The first processor is to check the root existence table to determine if a character is a root character. A second processor is in communication with a root active list. The second processor is to retrieve an entry that corresponds to the character and to associate a pointer with the entry. A third processor is in communication with the second processor and a tree structure. The third processor is to receive the pointer from the second processor and to maintain the tree and table structure.
摘要:
Methods and apparatuses for searching network data for one or more predetermined strings are disclosed. In one embodiment, the string search is a multi-stage search where the stages of the search are performed by different hardware components. In one embodiment in a first search stage, a first processor performs a comparison of blocks of incoming data to determine whether the blocks potentially represent the beginning of one of the predetermined strings. If a potential predetermined string is identified, a second processor performs a further search to determine whether the string matches one of the predetermined strings. Because the first processor searches only for the beginning of the predetermined strings, the first stage comparison can be performed quickly, which improves network performance as compared to more detailed searching. The second stage is performed by second processor, which allows the first processor to search for potential matching strings. Because many strings do not match the one or more predetermined strings, the more detailed search performed by the second processor is performed selectively, which increases network performance as compared to more detailed searches on all network data.
摘要:
Methods and apparatuses for searching network data for one or more predetermined strings are disclosed. In one embodiment, the string search is a multi-stage search where the stages of the search are performed by different hardware components. In one embodiment in a first search stage, a first processor performs a comparison of blocks of incoming data to determine whether the blocks potentially represent the beginning of one of the predetermined strings. If a potential predetermined string is identified, a second processor performs a further search to determine whether the string matches one of the predetermined strings. Because the first processor searches only for the beginning of the predetermined strings, the first stage comparison can be performed quickly, which improves network performance as compared to more detailed searching. The second stage is performed by second processor, which allows the first processor to search for potential matching strings. Because many strings do not match the one or more predetermined strings, the more detailed search performed by the second processor is performed selectively, which increases network performance as compared to more detailed searches on all network data.
摘要:
An embodiment of a system, method and apparatus is disclosed. The system comprises a first processor that is in communication with a root existence table. The first processor is to check the root existence table to determine if a character is a root character. A second processor is in communication with a root active list. The second processor is to retrieve an entry that corresponds to the character and to associate a pointer with the entry. A third processor is in communication with the second processor and a tree structure. The third processor is to receive the pointer from the second processor and to maintain the tree and table structure.
摘要:
Apparatus, methods, systems and computer program products are disclosed to provide improved optimizations of single-basic-block-loops. These optimizations include improved scheduling of blocking instructions for pipelined computers and improved scheduling and allocation of resources (such as registers) that cannot be spilled to memory. Scheduling of blocking instructions is improved by pre-allocating space in the scheduling reservation table. Improved scheduling and allocation of non-spillable resources results from converting the resource constraint into a data dependency constraint.
摘要:
Methods and apparatuses for searching network data for one or more predetermined strings are disclosed. In one embodiment, the string search is a multi-stage search where the stages of the search are performed by different hardware components. In one embodiment in a first search stage, a first processor performs a comparison of blocks of incoming data to determine whether the blocks potentially represent the beginning of one of the predetermined strings. If a potential predetermined string is identified, a second processor performs a further search to determine whether the string matches one of the predetermined strings. Because the first processor searches only for the beginning of the predetermined strings, the first stage comparison can be performed quickly, which improves network performance as compared to more detailed searching. The second stage is performed by second processor, which allows the first processor to search for potential matching strings. Because many strings do not match the one or more predetermined strings, the more detailed search performed by the second processor is performed selectively, which increases network performance as compared to more detailed searches on all network data.
摘要:
Methods and apparatuses for searching network data for one or more predetermined strings are disclosed. In one embodiment, the string search is a multi-stage search where the stages of the search are performed by different hardware components. In one embodiment in a first search stage, a first processor performs a comparison of blocks of incoming data to determine whether the blocks potentially represent the beginning of one of the predetermined strings. If a potential predetermined string is identified, a second processor performs a further search to determine whether the string matches one of the predetermined strings. Because the first processor searches only for the beginning of the predetermined strings, the first stage comparison can be performed quickly, which improves network performance as compared to more detailed searching. The second stage is performed by second processor, which allows the first processor to search for potential matching strings. Because many strings do not match the one or more predetermined strings, the more detailed search preformed by the second processor is performed selectively, which increases network performance as compared to more detailed searches on all network data.