摘要:
Non-speculatively prefetched data is prevented from being discarded from a cache memory before being accessed. In a cache memory including a cache control unit for reading data from a main memory into the cache memory and registering the data in the cache memory upon reception of a fill request from a processor and for accessing the data in the cache memory upon reception of a memory instruction from the processor, a cache line of the cache memory includes a registration information storage unit for storing information indicating whether the registered data is written into the cache line in response to the fill request and whether the registered data is accessed by the memory instruction. The cache control unit sets information in the registration information storage unit for performing a prefetch based on the fill request and resets the information for accessing the cache line based on the memory instruction.
摘要:
A parallel or grid computing system that having a plurality of nodes and achieves job scheduling for the nodes with a view toward system efficiency optimization. The parallel or grid computing system has a plurality of nodes for transmitting and receiving data and a communication path for exchanging data among the nodes, which are either a transmitting node for transmitting data or a receiving node for processing a job dependent on transmitted data, and further has a time measuring means for measuring the time interval between the instant at which data is called for by a job and the instant at which the data is transmitted from a transmitting node to a receiving node, a time counting means for adding up the measured wait time data about each job, and a job scheduling means for determining the priority of jobs in accordance with the counted wait time and for scheduling jobs.
摘要:
A splittable/connectible bus 140 and a network 1000 for transmitting coherence transactions between CPUs are provided between the CPUs, and a directory 160 and a group setup register 170 for storing bus-splitting information are provided in a directory control circuit 150 that controls cache invalidation. The bus is dynamically set to a split or connected state to fit a particular execution form of a job, and the directory control circuit uses the directory in order to manage all inter-CPU coherence control sequences in response to the above setting, while at the same time, in accordance with information of the group setup register, omitting dynamically bus-connected CPU-to-CPU cache coherence control, and conducting only bus-split CPU-to-CPU cache coherence control through the network. Thus, decreases in performance scalability due to an inter-CPU coherence-processing overhead are relieved in a system having multiple CPUs and guaranteeing inter-CPU cache coherence by use of hardware.
摘要:
An information processing system is connected to a common storage and executes programs by use of processors. This system includes a common storage; a plurality of processors, connected to the common storage. Each processor executes an instruction to store data from common storage, and an instruction to load data from the common storage into the cache storage, wherein each processor includes a communication controller for, when detecting synchronization completion information for attaining synchronization of execution of instructions among a plurality of processors, sending synchronization completion information and receiving synchronization information from another processor; an instruction executing section for detecting a specified change of the flag of a specified location in the common storage by executing a Monitor instruction included in a program in response to synchronization information from the communication controller; an execution controller to execute subsequent instructions after the Monitor instruction, exclusive of a Load instruction to load data into a cache storage, until a change of the flag is detected by the execution section, wherein the processor allows instruction for loading data from common storage into the cache storage to be executed after the flag detection, and wherein the execution controller may include an inhibit resetting circuit to issue an inhibit instruction control signal to terminate the instruction send-out inhibiting action of the instruction inhibit circuit according to input from a service processor.
摘要:
A method of generating optimum parallel codes from a source code for a computer system configured of plural processors that share a cache memory or a main memory is provided. A preset code is read and operation amounts and process contents are analyzed while distinguishing dependence and independence among processes from the code. Then, the amount of data to be reused among processes is analyzed, and the amount of data that accesses the main memory is analyzed. Further, upon the reception of a parallel code generation policy inputted by a user, the processes of the code are divided, and while estimating an execution cycle from the operation amount and process contents thereof, the cache use of the reuse data, and the main memory access data amount, a parallelization method with which the execution cycle becomes shortest is executed.
摘要:
Provided is a method of managing, in a computer including a processor and a memory that stores information referred to by the processor, the memory. The memory includes a plurality of memory banks, respective power supplies of which are independently controlled. The respective memory banks include a plurality of physical pages. The method includes collecting the physical pages having same degrees of use frequencies in the same memory bank, selecting the memory bank, the power supply for which is controlled, on the basis of the use frequency, and controlling the power supply for the memory bank selected.
摘要:
Provided is a method used in a computer system which includes at least one host computer, the method including managing a job to be executed by the host computer and a power supply of the host computer, the method including the procedures of: receiving the job; storing the received job; scheduling an execution plan for the stored job; determining, based on the execution plan of the job, a timing to execute power control of the host computer; determining a host computer to execute the power control when the determined timing to execute the power control is reached; controlling the power supply of the determined host computer; and executing the scheduled job.
摘要:
A processor reads a program including a prefetch command and a load command and data from a main memory, and executes the program. The processor includes: a processor core that executes the program; a L2 cache that stores data on the main memory for each predetermined unit of data storage; and a prefetch unit that pre-reads the data into the L2 cache from the main memory on the basis of a request for prefetch from the processor core. The prefetch unit includes: a L2 cache management table including an area in which a storage state is held for each position in the unit of data storage of the L2 cache and an area in which a request for prefetch is reserved; and a prefetch control unit that instructs, the L2 cache to perform the request for prefetch reserved or the request for prefetch from the processor core.
摘要:
Provided is a method of reliably reducing power consumption of a computer, while promoting prompt compilation of a source code and execution of an output code. The method according to this invention includes the steps of: reading a code which is preset and analyzing an amount of operation of the CPU and an access amount with respect to the cache memory based on the code; obtaining an execution rate of the CPU and an access rate with respect to the cache memory based on the amount of operation and the access amount; determining an area in which the access rate with respect to the cache memory is higher than the execution rate of the CPU, based on the code; adding a code for enabling the power consumption reduction function to the area; and generating an execution code executable on the computer, based on the code.
摘要:
To provide an easy way to constitute a processor from a plurality of LSIs, the processor includes: a first LSI containing a processor; a second LSI having a cache memory; and information transmission paths connecting the first LSI to a plurality of the second LSIs, in which the first LSI contains an address information issuing unit which broadcasts, to the second LSIs, via the information transmission paths, address information of data, the second LSI includes: a partial address information storing unit which stores a part of address information; a partial data storing unit which stores data that is associated with the address information; and a comparison unit which compares the address information broadcast with the address information stored in the partial address information storing unit to judge whether a cache hit occurs, and the comparison units of the plurality of the second LSIs are connected to the information transmission paths.