摘要:
Methods, systems, and media for reducing memory latency seen by processors by providing a measure of control over on-chip memory (OCM) management to software applications, implicitly and/or explicitly, via an operating system are contemplated. Many embodiments allow part of the OCM to be managed by software applications via an application program interface (API), and part managed by hardware. Thus, the software applications can provide guidance regarding address ranges to maintain close to the processor to reduce unnecessary latencies typically encountered when dependent upon cache controller policies. Several embodiments utilize a memory internal to the processor or on a processor node so the memory block used for this technique is referred to as OCM.
摘要:
Methods, systems, and media for reducing memory latency seen by processors by providing a measure of control over on-chip memory (OCM) management to software applications, implicitly and/or explicitly, via an operating system are contemplated. Many embodiments allow part of the OCM to be managed by software applications via an application program interface (API), and part managed by hardware. Thus, the software applications can provide guidance regarding address ranges to maintain close to the processor to reduce unnecessary latencies typically encountered when dependent upon cache controller policies. Several embodiments utilize a memory internal to the processor or on a processor node so the memory block used for this technique is referred to as OCM.
摘要:
Methods, systems, and media for reducing memory latency seen by processors by providing a measure of control over on-chip memory (OCM) management to software applications, implicitly and/or explicitly, via an operating system are contemplated. Many embodiments allow part of the OCM to be managed by software applications via an application program interface (API), and part managed by hardware. Thus, the software applications can provide guidance regarding address ranges to maintain close to the processor to reduce unnecessary latencies typically encountered when dependent upon cache controller policies. Several embodiments utilize a memory internal to the processor or on a processor node so the memory block used for this technique is referred to as OCM.
摘要:
A method of assigning virtual memory to physical memory in a data processing system allocates a set of contiguous physical memory pages for a new page mapping, instructs the memory controller to move the virtual memory pages according to the new page mapping, and then allows access to the virtual memory pages using the new page mapping while the memory controller is still copying the virtual memory pages to the set of physical memory pages. The memory controller can use a mapping table which temporarily stores entries of the old and new page addresses, and releases the entries as copying for each entry is completed. The translation lookaside buffer (TLB) entries in the processor cores are updated for the new page addresses prior to completion of copying of the memory pages by the memory controller. The invention can be extended to non-uniform memory array (NUMA) systems. For systems with cache memory, any cache entry which is affected by the page move can be updated by modifying its address tag according to the new page mapping. This tag modification may be limited to cache entries in a dirty coherency state. The cache can further relocate a cache entry based on a changed congruence class for any modified address tag.
摘要:
A method, system, and computer program product for on-chip control of thermal cycling in an integrated circuit (IC) are provided in the illustrative embodiments. A first circuit is configured on the IC for adjusting a first voltage being applied to a first part of the IC. A first temperature of the first part is measured at a first time. A determination is made that the first temperature is outside a temperature range defined by an upper temperature threshold and a lower temperature threshold. The first voltage is adjusted by reducing the first voltage when the first temperature exceeds the upper temperature threshold and by increasing the first voltage when the first temperature is below the lower temperature threshold, thereby causing the first temperature of the first part to attain a value within the temperature range.
摘要:
A method for fast remote communication and computation between processors is provided in the illustrative embodiments. A direct core to core communication unit (DCC) is configured to operate with a first processor, the first processor being a remote processor. A memory associated with the DCC receives a set of bytes, the set of bytes being sent from a second processor. An operation specified in the set of bytes is executed at the remote processor such that the operation is invoked without causing a software thread to execute.
摘要:
A method for allocating memory in a data processing system in which a configuration table indicative of the system's physical memory is generated following a boot event. The configuration table is then modified to identify a portion of the system's physical memory thereby hiding the remaining portion from the operating system. Subsequently, a memory allocation request is initiated by an application program. A device driver invoked by the application program then maps physical memory from the hidden portion to the application's virtual address space to satisfy the application request. The application program may be executing on a first node of a multi-node system in which each node is associated with its own local memory, In this embodiment, the node on which the allocated physical memory is located may be derived from the allocation request thereby facilitating application level, allocation of specified portions of physical memory.
摘要:
A method and system for reliably and consistently delivering client requests in a computer network having at least one client connectable to one or more servers among a group of servers, wherein each server among the group of servers replicates a particular network service to ensure that the particular network service remains uninterrupted in the event of a server failure. A particular server is designated among the group of servers to manage client requests which seek to update a particular network service state, prior to any receipt of a client request which seeks to update the particular network service state by any remaining servers among the group of servers. Thereafter, an executable order is specified in which client requests which seek to update the particular network service state are processed among the remaining servers, such that the executable order, upon execution, sequences the client request which seeks to update the particular network service state with respect to all prior and subsequent client requests. The executable order and the client request which seeks to update the particular network service state are automatically transferred to the remaining servers from the particular server, in response to initiating the client request. Thereafter, the client request which seeks to update the particular network service state is processed in a tentative mode at the particular server without waiting for the executable order to be executed through to completion among the remaining servers.
摘要:
A method for a framework for scheduling tasks in a multi-core processor or multiprocessor system is provided in the illustrative embodiments. A thread is selected according to an order in a scheduling discipline, the thread being a thread of an application executing in the data processing system, the thread forming the leader thread in a bundle of threads. A value of a core attribute in a set of core attributes is determined according to a corresponding thread attribute in a set of thread attributes associated with the leader thread. A determination is made whether a second thread can be added to the bundle such that the bundle including the second thread will satisfy a policy. If the determining is affirmative, the second thread is added to the bundle. The bundle is scheduled for execution using a core of the multi-core processor.
摘要:
A method for fast remote communication and computation between processors is provided in the illustrative embodiments. A direct core to core communication unit (DCC) is configured to operate with a first processor, the first processor being a remote processor. A memory associated with the DCC receives a set of bytes, the set of bytes being sent from a second processor. An operation specified in the set of bytes is executed at the remote processor such that the operation is invoked without causing a software thread to execute.