Determining internodal processor interconnections in a data-parallel computing system

    公开(公告)号:US12056085B2

    公开(公告)日:2024-08-06

    申请号:US18096253

    申请日:2023-01-12

    CPC分类号: G06F15/825 G06F13/4068

    摘要: A computer-implemented method comprises a topological communications configurator (TCC) of a computing system determining a connections-optimized configuration of processors among compute nodes of the system. Processors included in the compute nodes can execute compute workers of an application of the system and can form intranodal segments of an internodal interconnection topology communicatively coupling the intranodal segments. The intranodal segments can be interconnected via an internodal interconnections fabric. The TCC can determine the connections-optimized configuration based on internodal communications costs corresponding to communications routes among the internodal segments via the internodal interconnection fabric. The computing system can comprise the TCC and can comprise a data-parallel computing system.

    Message synchronization system
    3.
    发明授权

    公开(公告)号:US12056084B2

    公开(公告)日:2024-08-06

    申请号:US17447732

    申请日:2021-09-15

    摘要: A method for synchronizing messages between processors is provided. The method comprising receiving, by a first external device, inbound messages for applications running redundantly in high integrity mode on two or more multi-core processors. The inbound messages are synchronously copied to the multi-core processors. The multi-core processors send outbound messages to respective alignment queues in the first external device or a second external device, wherein the outbound messages contain calculation results from the inbound messages. The first or second external device compares the alignment queues. Matched outbound messages in the alignment queues are sent to a network or data bus. Any unmatched outbound messages in the alignment queues are discarded.

    Massively parallel hierarchical control system and method

    公开(公告)号:US11947470B2

    公开(公告)日:2024-04-02

    申请号:US17244332

    申请日:2021-04-29

    摘要: A system is disclosed for controlling controllable elements of an external component. The system uses a state translator subsystem (“STS”) which receives a state command from an external subsystem. The STS has at least one module for processing the state command and generating operational commands, in parallel, over a first plurality of channels, to control the elements of the external component. A programmable calibration command translation layer subsystem (“PCCTL”) uses the operational commands to generate granular level commands for controlling the elements, and to transmit the granular level commands over a second plurality of channels. A subsystem is coupled between the PCCTL and the elements, which receives the commands from the PCCTL and uses the commands to generate final output commands, which are applied in parallel, over a third plurality of channels, to the elements.

    INSTRUCTION FORMAT AND INSTRUCTION SET ARCHITECTURE FOR TENSOR STREAMING PROCESSOR

    公开(公告)号:US20240037064A1

    公开(公告)日:2024-02-01

    申请号:US18483026

    申请日:2023-10-09

    申请人: Groq, Inc.

    IPC分类号: G06F15/82 G06N20/00

    CPC分类号: G06F15/825 G06N20/00

    摘要: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.

    A CONFIGURABLE PROCESSING ARCHITECTURE
    6.
    发明公开

    公开(公告)号:US20240028554A1

    公开(公告)日:2024-01-25

    申请号:US18027078

    申请日:2020-09-18

    IPC分类号: G06F15/80 G06F15/82

    CPC分类号: G06F15/80 G06F15/825

    摘要: A configurable processing unit including a core processing element and a plurality of assist processing elements can be coupled together by one or more networks. The core processing element can include a large processing logic, large non-volatile memory, input/output interfaces and multiple memory channels. The plurality of assist processing elements can each include smaller processing logic, smaller non-volatile memory and multiple memory channels. One or more bitstreams can be utilized to configure and reconfigure computation resources of the core processing element and memory management of the plurality of assist processing elements.

    Instruction format and instruction set architecture for tensor streaming processor

    公开(公告)号:US11822510B1

    公开(公告)日:2023-11-21

    申请号:US17684337

    申请日:2022-03-01

    申请人: Groq, Inc.

    IPC分类号: G06F15/82 G06N20/00

    CPC分类号: G06F15/825 G06N20/00

    摘要: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.

    Computer-readable recording medium storing program and management method

    公开(公告)号:US11822408B2

    公开(公告)日:2023-11-21

    申请号:US17717188

    申请日:2022-04-11

    申请人: FUJITSU LIMITED

    发明人: Akira Hirai

    IPC分类号: G06F1/28 G06F15/82

    CPC分类号: G06F1/28 G06F15/82

    摘要: A recording medium stores a program for causing a computer to execute processing including: acquiring a first process execution time and energy consumption of a first processor core in the execution time when a process executed by the first processor core is switched from a first process to a second process; specifying one or more processes of a first process group to which the first process belongs, from among process groups each of which is a group of processes and calculating an index that indicates the energy consumption per unit time involved in execution of the first process group based on the execution time and the energy consumption acquired for the specified one or more processes; and controlling an operation of a processor core to which the process is allocated according to comparison between the index calculated for the first process group with a threshold.

    Distribution of Over-Configured Logical Processors

    公开(公告)号:US20230281158A1

    公开(公告)日:2023-09-07

    申请号:US17653798

    申请日:2022-03-07

    IPC分类号: G06F15/82 G06F13/36

    CPC分类号: G06F15/82 G06F13/36

    摘要: Logical processor distribution across physical processors is provided. A set of logical processors of a number of logical processors defined for a particular logical partition of a plurality of active logical partitions is assigned to a physical processor chip having a greatest logical processor entitlement for the particular logical partition until no more logical processors can be assigned to that physical processor chip based on a logical processor entitlement of that physical processor chip being exhausted. Remaining logical processors of the number of logical processors defined for the particular logical partition are assigned to other physical processor chips of a plurality of physical processor chips assigned to the particular logical partition until all of the remaining logical processors have been assigned to a physical processor chip.