摘要:
Systems and methods are disclosed for enhancing the throughput of a processor by minimizing the number of transfers of data associated with data transfer between a register file and a memory stack. The register file used by a processor running an application is partitioned into a number of blocks. A subset of the blocks of the register file is defined in an application binary interface enabling the subset to be pre-allocated and exposed to the application binary interface. Optionally, blocks other than the subset are not exposed to the application binary interface so that the data relating to application function switch or a context switch is not transferred between the unexposed blocks and a memory stack.
摘要:
Systems and methods are disclosed for enhancing the throughput of a processor by minimizing the number of transfers of data associated with data transfer between a register file and a memory stack. The register file used by a processor running an application is partitioned into a number of blocks. A subset of the blocks of the register file is defined in an application binary interface enabling the subset to be pre-allocated and exposed to the application binary interface. Optionally, blocks other than the subset are not exposed to the application binary interface so that the data relating to application function switch or a context switch is not transferred between the unexposed blocks and a memory stack.
摘要:
A method of transforming low-level programming language code written for execution by a target processor includes receiving data comprising a plurality of low-level programming language instructions ordered for sequential execution by the target processor; detecting a pair of instructions in the plurality of low-level programming language instructions having a memory dependency therebetween; and inserting one or more instructions between the detected pair of instructions in the plurality of low-level programming language instructions having a memory dependency therebetween. The one or more instructions inserted between the detected pair of instructions create a true data dependency on a value stored in an architectural register of the target processor between the detected pair of instructions.
摘要:
A method for transferring data from a general purpose register (GPR) to a vector register (VR), the method including vectorially combining data in the VR from the GPR, by executing instructions of a PowerPC Instruction Set Architecture (ISA), the step of combining including splatting a low nibble from the GPR into a low nibble in each element of a first VR by executing two “load vector for shift left” (lvsl) or “load vector for shift right” (lvsr) and one “vector subtract unsigned byte modulo” (vsububm), shifting a high nibble of the GPR into a low nibble the GPR, splatting the low nibble of the GPR into a low nibble in each element of a second VR by re-executing the two lvsl or lvsr and one vsububm instructions, shifting the low nibble of the second VR into a high nibble of the second VR and combining both first and second VRs into one VR.
摘要:
A method and system for identifying calls in a Java package whose targets are guaranteed to belong to the package. According to the method an inheritance graph and access permissions of respective components in the package are determined, both of which are used in combination with the knowledge that the package is seared and signed to determine whether all the targets of a call are guaranteed to belong to the package. The identification of calls according to the invention can be performed at the time the package is sealed and signed or as a separate phase thereafter and allows for better compiler optimization.
摘要:
An apparatus and method for determining whether data needed for one or more operations is stored in a cache and scheduling the operations for execution based on the determination. For example, one embodiment of a processor comprises: a hierarchy of cache levels for caching data including at least a level 1 (L1) cache; cache occupancy determination logic to determine whether data associated with one or more subsequent operations is stored in one of the cache levels; and scheduling logic to schedule execution of the subsequent operations based on the determination of whether data associated with the subsequent operations is stored in the cache levels.
摘要:
A data layout optimization may utilize affinity estimation between pairs of fields of a record in a computer program. The affinity estimation may be determined based on a trace of an execution and in view of actual processing entities performing each access to the fields. The disclosed subject matter may be configured to be aware of a specific architecture of a target computer having a plurality of processing entities, executing the program so as to provide an improved affinity estimation which may take into account both false sharing issues, spatial locality improvement and the like.
摘要:
A data transmission optimization method and system. The method comprises analyzing program code to identify data access calls in the program code, using one or more processor; determining whether a first data access call is for retrieving target data stored in a data structure with a plurality of fields, wherein the target data is stored in one or more target fields of the data structure; determining whether servicing the first data access call will result in transfer of non-target data stored in one or more non-target fields in the data structure; and replacing the first data access call with a second data access call, wherein servicing the second data access call will result in transfer of the target data and minimizes the transfer of non-target data.
摘要:
A method for managing multiple values assigned to a variable during various stages of a software pipelined process executed in a computing environment. The method comprises allocating two or more slots in a vector register to two or more values associated with said variable during two or more stages of a pipeline process; and rotating values in each slot responsive to an instruction.
摘要:
A method for transferring data from a general purpose register to a vector register, the method including splatting a byte of data directly from a general purpose register (GPR) to a vector register (VR) by means of vector permute instructions, and splatting another byte of data from the GPR to the VR and vectorially combining the data in the VR.