摘要:
A method comprising receiving control information at a first processing element from a second processing element, synchronizing objects within a shared global memory space of the first processing element with a shared global memory space of a second processing element in response to receiving the control information and generating a completion event indicating the first processing element has been synchronized with the second processing element.
摘要:
A method comprising receiving control information at a first processing element from a second processing element, synchronizing objects within a shared global memory space of the first processing element with a shared global memory space of a second processing element in response to receiving the control information and generating a completion event indicating the first processing element has been synchronized with the second processing element.
摘要:
A method of managing instant messaging communication over a computer network is provided. One or more instant messaging session windows are organized in an instant messaging session manager. At least one distinguishing session characteristic is attributed to each of the one or more instant messaging session windows. The at least one distinguishing session characteristic is at least one of a sound clip associated with a user of the session, an instant messaging session window background associated with a user of the session, and a change in at least one of a color and an intensity of the instant messaging session window. The at least one distinguishing session characteristic increases a likelihood of identification of each of the one or more instant messaging session windows.
摘要:
A method for checking program correctness may include executing a program on a main hardware thread in speculative execution mode on a hardware execution context on a chip having a plurality of hardware execution contexts. In this mode, the main hardware thread's state is not committed to main memory. Correctness checks by a plurality of helper threads are executed in parallel to the main hardware thread. Each helper thread runs on a separate hardware execution context on the chip in parallel with the main hardware thread. The correctness checks determine a safe point in the program up to which the operations executed by the main hardware thread are correct. Once the main hardware thread reaches the safe point, the mode of execution of the main hardware thread is switched to non-speculative. The runtime then causes the main thread to re-enter speculative mode of execution.
摘要:
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
摘要翻译:具有100 petaOPS规模计算的多Petascale高效并行超级计算机,其成本,功耗和占地面积都在降低,并且允许从互连角度来看处理节点的最大封装密度。 超级计算机利用了VLSI的技术进步,实现了许多处理器可以集成到单个专用集成电路(ASIC)中的计算模型。 每个ASIC计算节点包括利用集成到一个管芯中的四个或更多个处理器的片上系统ASIC,每个处理器具有对所有系统资源的完全访问,并且使得处理器能够对诸如计算或消息传递I / O 并且优选地,根据应用内的各种算法阶段实现功能的自适应分割,或者如果I / O或其他处理器未被充分利用,则可以参与计算或通信节点通过五维环面网络互连 使用DMA来最大限度地最大化节点之间的分组通信的吞吐量并最小化等待时间。
摘要:
A system call utility may be provided on a first operating system managing a first hardware computing entity. The system call utility may take as an argument a pointer to a computer code a second operating system established to run on the first hardware computing entity. The first operating system is enabled to execute the computer code natively on the first hardware computing entity, and return a result of the computer code executed on the first hardware computing entity to the second operating system.
摘要:
A device for copying performance counter data includes hardware path that connects a direct memory access (DMA) unit to a plurality of hardware performance counters and a memory device. Software prepares an injection packet for the DMA unit to perform copying, while the software can perform other tasks. In one aspect, the software that prepares the injection packet runs on a processing core other than the core that gathers the hardware performance counter data.
摘要:
Point-to-point intra-nodelet messaging support for nodelets on a single chip that obey MPI semantics may be provided. In one aspect, a local buffering mechanism is employed that obeys standard communication protocols for the network communications between the nodelets integrated in a single chip. Sending messages from one nodelet to another nodelet on the same chip may be performed not via the network, but by exchanging messages in the point-to-point messaging buckets between the nodelets. The messaging buckets need not be part of the memory system of the nodelets. Specialized hardware controllers may be used for moving data between the nodelets and each messaging bucket, and ensuring correct operation of the network protocol.
摘要:
A method for checking program correctness may include executing a program on a main hardware thread in speculative execution mode on a hardware execution context on a chip having a plurality of hardware execution contexts. In this mode, the main hardware thread's state is not committed to main memory. Correctness checks by a plurality of helper threads are executed in parallel to the main hardware thread. Each helper thread runs on a separate hardware execution context on the chip in parallel with the main hardware thread. The correctness checks determine a safe point in the program up to which the operations executed by said main hardware thread are correct. Once the main hardware thread reaches the safe point, the mode of execution of the main hardware thread is switched to non-speculative. The runtime then causes the main thread to re-enter speculative mode of execution.
摘要:
A hierarchical barrier synchronization of cores and nodes on a multiprocessor system, in one aspect, may include providing by each of a plurality of threads on a chip, input bit signal to a respective bit in a register, in response to reaching a barrier; determining whether all of the plurality of threads reached the barrier by electrically tying bits of the register together and “AND”ing the input bit signals; determining whether only on-chip synchronization is needed or whether inter-node synchronization is needed; in response to determining that all of the plurality of threads on the chip reached the barrier, notifying the plurality of threads on the chip, if it is determined that only on-chip synchronization is needed; and after all of the plurality of threads on the chip reached the barrier, communicating the synchronization signal to outside of the chip, if it is determined that inter-node synchronization is needed.