摘要:
A system and method for improving the yield rate of a multiprocessor semiconductor chip that includes primary processor cores and one or more redundant processor cores. A first tester conducts a first test on one or more processor cores, and encodes results of the first test in an on-chip non-volatile memory. A second tester conducts a second test on the processor cores, and encodes results of the second test in an external non-volatile storage device. An override bit of a multiplexer is set if a processor core fails the second test. In response to the override bit, the multiplexer selects a physical-to-logical mapping of processor IDs according to one of: the encoded results in the memory device or the encoded results in the external storage device. On-chip logic configures the processor cores according to the selected physical-to-logical mapping.
摘要:
An apparatus and method for evaluating a state of an electronic or integrated circuit (IC), each IC including one or more processor elements for controlling operations of IC sub-units, and each the IC supporting multiple frequency clock domains. The method comprises: generating a synchronized set of enable signals in correspondence with one or more IC sub-units for starting operation of one or more IC sub-units according to a determined timing configuration; counting, in response to one signal of the synchronized set of enable signals, a number of main processor IC clock cycles; and, upon attaining a desired clock cycle number, generating a stop signal for each unique frequency clock domain to synchronously stop a functional clock for each respective frequency clock domain; and, upon synchronously stopping all on-chip functional clocks on all frequency clock domains in a deterministic fashion, scanning out data values at a desired IC chip state. The apparatus and methodology enables construction of a cycle-by-cycle view of any part of the state of a running IC chip, using a combination of on-chip circuitry and software.
摘要:
A plurality of target field programmable gate arrays are interconnected in accordance with a connection topology and map portions of a target system. A control module is coupled to the plurality of target field programmable gate arrays. A balanced clock distribution network is configured to distribute a reference clock signal, and a balanced reset distribution network is coupled to the control module and configured to distribute a reset signal to the plurality of target field programmable gate arrays. The control module and the balanced reset distribution network are cooperatively configured to initiate and control a simulation of the target system with the plurality of target field programmable gate arrays. A plurality of local clock control state machines reside in the target field programmable gate arrays. The local clock control state machines are coupled to the balanced clock distribution network and obtain the reference clock signal therefrom. The plurality of local clock control state machines are configured to generate a set of synchronized free-running and stoppable clocks to maintain cycle-accurate and cycle-reproducible execution of the simulation of the target system. A method is also provided.
摘要:
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
摘要翻译:具有100 petaOPS规模计算的多Petascale高效并行超级计算机,其成本,功耗和占地面积都在降低,并且允许从互连角度来看处理节点的最大封装密度。 超级计算机利用了VLSI的技术进步,实现了许多处理器可以集成到单个专用集成电路(ASIC)中的计算模型。 每个ASIC计算节点包括利用集成到一个管芯中的四个或更多个处理器的片上系统ASIC,每个处理器具有对所有系统资源的完全访问,并且使得处理器能够对诸如计算或消息传递I / O 并且优选地,根据应用内的各种算法阶段实现功能的自适应分割,或者如果I / O或其他处理器未被充分利用,则可以参与计算或通信节点通过五维环面网络互连 使用DMA来最大限度地最大化节点之间的分组通信的吞吐量并最小化等待时间。
摘要:
A memory system and method for providing atomic memory-based counter operations to operating systems and applications that make most efficient use of counter-backing memory and virtual and physical address space, while simplifying operating system memory management, and enabling the counter-backing memory to be used for purposes other than counter-backing storage when desired. The encoding and address decoding enabled by the invention provides all this functionality through a combination of software and hardware.
摘要:
An apparatus and method for evaluating a state of an electronic or integrated circuit (IC), each IC including one or more processor elements for controlling operations of IC sub-units, and each the IC supporting multiple frequency clock domains. The method comprises: generating a synchronized set of enable signals in correspondence with one or more IC sub-units for starting operation of one or more IC sub-units according to a determined timing configuration; counting, in response to one signal of the synchronized set of enable signals, a number of main processor IC clock cycles; and, upon attaining a desired clock cycle number, generating a stop signal for each unique frequency clock domain to synchronously stop a functional clock for each respective frequency clock domain; and, upon synchronously stopping all on-chip functional clocks on all frequency clock domains in a deterministic fashion, scanning out data values at a desired IC chip state. The apparatus and methodology enables construction of a cycle-by-cycle view of any part of the state of a running IC chip, using a combination of on-chip circuitry and software.
摘要:
An apparatus and method for controlling power usage in a computer includes a plurality of computers communicating with a local control device, and a power source supplying power to the local control device and the computer. A plurality of sensors communicate with the computer for ascertaining power usage of the computer, and a system control device communicates with the computer for controlling power usage of the computer.
摘要:
A memory system and method for providing atomic memory-based counter operations to operating systems and applications that make most efficient use of counter-backing memory and virtual and physical address space, while simplifying operating system memory management, and enabling the counter-backing memory to be used for purposes other than counter-backing storage when desired. The encoding and address decoding enabled by the invention provides all this functionality through a combination of software and hardware.
摘要:
A plurality of target field programmable gate arrays are interconnected in accordance with a connection topology and map portions of a target system. A control module is coupled to the plurality of target field programmable gate arrays. A balanced clock distribution network is configured to distribute a reference clock signal, and a balanced reset distribution network is coupled to the control module and configured to distribute a reset signal to the plurality of target field programmable gate arrays. The control module and the balanced reset distribution network are cooperatively configured to initiate and control a simulation of the target system with the plurality of target field programmable gate arrays. A plurality of local clock control state machines reside in the target field programmable gate arrays. The local clock control state machines are coupled to the balanced clock distribution network and obtain the reference clock signal therefrom. The plurality of local clock control state machines are configured to generate a set of synchronized free-running and stoppable clocks to maintain cycle-accurate and cycle-reproducible execution of the simulation of the target system. A method is also provided.
摘要:
A system and method for improving the yield rate of a multiprocessor semiconductor chip that includes primary processor cores and one or more redundant processor cores. A first tester conducts a first test on one or more processor cores, and encodes results of the first test in an on-chip non-volatile memory. A second tester conducts a second test on the processor cores, and encodes results of the second test in an external non-volatile storage device. An override bit of a multiplexer is set if a processor core fails the second test. In response to the override bit, the multiplexer selects a physical-to-logical mapping of processor IDs according to one of: the encoded results in the memory device or the encoded results in the external storage device. On-chip logic configures the processor cores according to the selected physical-to-logical mapping.