摘要:
In some embodiments, a method and apparatus for automatically parallelizing a sequential network application through pipeline transformation are described. In one embodiment, the method includes the configuration of a network processor into a D-stage processor pipeline. Once configured, a sequential network application program is transformed into D-pipeline stages. Once transformed, the D-pipeline stages are executed in parallel within the D-stage processor pipeline. In one embodiment, transformation of a sequential application program is performed by modeling the sequential network program as a flow network model and selecting from the flow network model into a plurality of preliminary pipeline stages. Other embodiments are described and claimed.
摘要:
In some embodiments, a method and apparatus for automatically parallelizing a sequential network application through pipeline transformation are described. In one embodiment, the method includes the configuration of a network processor into a D-stage processor pipeline. Once configured, a sequential network application program is transformed into D-pipeline stages. Once transformed, the D-pipeline stages are executed in parallel within the D-stage processor pipeline. In one embodiment, transformation of a sequential application program is performed by modeling the sequential network program as a flow network model and selecting from the flow network model into a plurality of preliminary pipeline stages. Other embodiments are described and claimed.
摘要:
In some embodiments, a method and apparatus for automatically parallelizing a sequential network application through pipeline transformation are described. In one embodiment, the method includes the configuration of a network processor into a D-stage processor pipeline. Once configured, a sequential network application program is transformed into D-pipeline stages. Once transformed, the D-pipeline stages are executed in parallel within the D-stage processor pipeline. In one embodiment, transformation of a sequential application program is performed by modeling the sequential network program as a flow network model and selecting from the flow network model into a plurality of preliminary pipeline stages. Other embodiments are described and claimed.
摘要:
In some embodiments, a method and apparatus for automatically parallelizing a sequential network application through pipeline transformation are described. In one embodiment, the method includes the configuration of a network processor into a D-stage processor pipeline. Once configured, a sequential network application program is transformed into D-pipeline stages. Once transformed, the D-pipeline stages are executed in parallel within the D-stage processor pipeline. In one embodiment, transformation of a sequential application program is performed by modeling the sequential network program as a flow network model and selecting from the flow network model into a plurality of preliminary pipeline stages. Other embodiments are described and claimed.
摘要:
In some embodiments, a method and apparatus for an automatic thread-partition compiler are described. In one embodiment, the method includes the transformation of a sequential application program into a plurality of application program threads. Once partitioned, the plurality of application program threads are concurrently executed as respective threads of a multi-threaded architecture. Hence, a performance improvement of the parallel multi-threaded architecture is achieved by hiding memory access latency through or by overlapping memory access with computations or with other memory accesses. Other embodiments are described and claimed.
摘要:
Automatic software controlled caching generations in network applications are described herein. In one embodiment, a candidate representing a plurality of instructions of a plurality of threads that perform one or more external memory accesses is identified, where the external memory accesses have a substantially identical base address. One or more directives and/or instructions are inserted into an instruction stream corresponding to the identified candidate to maintain contents of at least one of a content addressable memory (CAM) and local memory (LM) of a processor, and to modify at least one of the external memory access to access at least one of the CAM and LM of the processor without having to perform the respective external memory access. Other methods and apparatuses are also described.
摘要:
A compilation method includes converting memory access instructions that read or write less than a minimum data access unit (MDAU) to memory access instructions that read or write a multiple of the minimum data access unit, converting the memory access instructions into a format including a base address plus an offset, grouping subsets of the converted memory access instructions into partitions, and vectorizing the converted memory access instructions in the subsets that match instruction patterns.
摘要:
Automatic software controlled caching generations in network applications are described herein. In one embodiment, a candidate representing a plurality of instructions of a plurality of threads that perform one or more external memory accesses is identified, where the external memory accesses have a substantially identical base address. One or more directives and/or instructions are inserted into an instruction stream corresponding to the identified candidate to maintain contents of at least one of a content addressable memory (CAM) and local memory (LM) of a processor, and to modify at least one of the external memory access to access at least one of the CAM and LM of the processor without having to perform the respective external memory access. Other methods and apparatuses are also described.
摘要:
A compiler includes a location-assigning module to optimally allocate register locations in various memory blocks of a memory during compilation of a program code in accordance with code proximity of the program code in accessing the register locations and size of each of the memory blocks.
摘要:
A compiler includes a location-assigning module to optimally allocate register locations in various memory blocks of a memory during compilation of a program code in accordance with code proximity of the program code in accessing the register locations and size of each of the memory blocks.