Abstract:
A multi-core processor includes a plurality of cores, each core configured to output an scan output pattern in response to an input of an scan input pattern, a multiplexing circuit configured to be responsive to a selection signal to output one of the scan output patterns output by the plurality of cores, and a comparison circuit configured to compare the scan output patterns with one another in units of bits, and to generate a plurality of comparison signals corresponding to comparison results.
Abstract:
A method of operating a network switch for collective communication includes: receiving, via a network from external electronic devices, a first and second matrix each formatted according to a sparse matrix storage format; and generating a third matrix formatted according to the sparse matrix storage format, wherein the third matrix is generated by combining the first and second matrix according to the sparse matrix storage format, wherein, according to the sparse matrix storage format the first matrix includes first matrix positions of respective first element values and the second matrix includes second matrix positions of respective second element values, and wherein the combining includes comparing the first matrix positions with the second matrix positions.
Abstract:
A processing device including a first buffer storing calculation rules, a calculator including a plurality of multipliers and an adder, the multipliers configured to perform multiplication repeatedly, a second buffer storing operands, the second buffer being configured to enqueue the operands based on the calculation rules into a queue, and a counter indicating a respective number indicating a number of times a multiplication is to be performed by each of the plurality of multipliers, each multiplier of the plurality of multipliers being configured to provide a non-final multiplication result to a first path to an input of the corresponding multiplier responsive to a corresponding number of multiplications performed by the multiplier being less than the respective number, and provide a final multiplication result to a second path to the adder responsive to the corresponding number of multiplications performed by the multiplier being equal to the respective number.
Abstract:
A computing device includes: a processor; a memory stack in which memories connected to the processor are stacked; and a substrate disposed under the processor, wherein a network bandwidth between the processor and the substrate is five or less times a memory bandwidth between the processor and the memory stack.
Abstract:
A computing device includes: a processor; a memory stack in which memories connected to the processor are stacked; and a substrate disposed under the processor, wherein a network bandwidth between the processor and the substrate is five or less times a memory bandwidth between the processor and the memory stack.
Abstract:
Disclosed are electronic devices with predetermined compression schemes for parallel computing and methods thereof. An example electronic device includes cores of one or more processors, one or more memories storing instructions configured to, when executed by the cores, configure the cores to perform operations of an application executed on the electronic device, the operations including communication phases that communicate data between the cores, wherein the application includes, prior to execution of the application on the electronic device, predetermined information associating the communication phases with respective compression schemes, and apply the compression schemes corresponding to the communication phases according to the predetermined information to compress the data of the communication phases that is exchanged between the cores when executing the application.