摘要:
A compiler technique for reducing the number of executed branches in a code sequence. Multiple condition branch instructions in a program sequence are replaced with a single combined conditional branch instruction thereby eliminating the time-consuming execution of multiple branch instructions by a target processor.
摘要:
The present disclosure includes a system and method for managing network and server power. In an example of managing network and server power according to the present disclosure, routing network traffic is routed onto a number of core networks based on core network statistics, capacity requirements are determined based on core network statistics for the number of core networks and for a number of servers operating a number of virtual machines on the number of core networks, wherein the number of core networks include a number of core switches and a number of edge switches, and the capacity is set for the number of core switches based on the capacity requirements for the number of core networks and for the number of servers based on the capacity requirements for the number of servers.
摘要:
A barrier for synchronizing program threads for a plurality of processors includes a filter configured to be coupled to a plurality of processors executing a plurality of threads to be synchronized. The filter is configured to monitor and selectively block fill requests for instruction cache lines. A method for synchronizing program threads for a plurality of processors includes configuring a filter to monitor and selectively block fill requests for instruction cache lines for a plurality of processors executing a plurality of threads to be synchronized.
摘要:
According to an exemplary embodiment of the present invention, a method for loading data from at least one memory device includes the steps of loading a first value from a first memory location of the at least one memory device, determining a second memory location based on the first value and loading a second value from the second memory location of the at least one memory device, wherein the step of loading a first value is performed by a first processing unit and wherein the steps of determining a second memory location and loading a second value are performed by at least one other processing unit.
摘要:
The invention is a system and method for executing programs. The invention involves a plurality of processing elements, wherein a processing element of the plurality of processing elements generates a branch command. The invention uses a programmable network that transports the branch command from the processing element to one of a first destination processing element by a first programmed transport route and a second destination processing element by a second programmed transport route. The branch command is received and processed by one of the first destination processing element and the second destination processing element, and is not processed by the other of the first processing element and the second processing element.
摘要:
The invention is a system and method for executing a program that comprises a plurality of basic blocks on a computer system that comprises a plurality of processing elements. The invention generates a branch instruction by one processing element of the plurality of processing elements, sends the branch instruction to the plurality of processing elements. The invention then independently branches to a target of the branch instruction by each of the processing elements of the plurality of processing elements when each processing element receives the sent branch instruction. At least one processing element of the plurality of processing elements receives the branch instruction at a time later than another processing element of the plurality of processing elements.