Abstract:
A described embodiment of the present invention includes a network having a first, second and third plurality of routers connected to a plurality of endpoints. At least one of the first plurality of routers includes a plurality of interposers having a number of queues. The at least one of the first plurality of routers has a demultiplexer for each interposer configured to receive multiplexed data from the interposer and provide demultiplexed data on to a plurality of second queues corresponding to the first queues of the number of queues. The at least one of the first plurality of routers also includes a number multiplexers, each of the number multiplexers having inputs configured to receive data from the number of queues.
Abstract:
Methods and apparatus for inter-process communication are provided. A circuit may have a plurality of clusters, and at least one cluster may have a computation element (CE), a memory operatively coupled with the CE, and an autonomic transport system (ATS) block operatively coupled with the CE and the memory. The ATS block may be configured to perform inter-process communication (IPC) for the at least one cluster. In one embodiment, the ATS block may transfer a message to a different cluster based on a request from the CE. In another embodiment, the ATS block may receive a message by allocating a buffer in the memory and write the message into the buffer. The ATS block may also be configured to manage synchronization and schedule tasks for the CE.
Abstract:
Technology for providing data to a processing unit is disclosed. A computer processor may be divided into a master processing unit and consumer processing units. The master processing unit at least partially decodes a machine instruction and determines whether data is needed to execute the machine instruction. The master processing unit sends a request to memory for the data. The request may indicate that the data is to be sent from the memory to a consumer processing unit. The data sent by the memory in response to the request may be stored in local read storage that is close to the consumer processing unit for fast access. The master processing unit may also provide the machine instruction to the consumer processing unit. The consumer processing unit may access the data from the local read storage and execute the machine instruction based on the accessed data.
Abstract:
Disclosed is method for operating an interposer that includes assigning a binary port weight to a plurality of input ports of the interposer. The sum of all of the port weights is less than or equal to a number of traversals available to the interposer in a cycle. A traversal counter is set zero at the beginning of each cycle. The output of the traversal counter is a binary number of m bits. A mask is generated when a bit of the traversal counter transitions from a zero to a one. The mask is generated having the m−k+1 bit of the mask equal to one and all other bits of the mask equal to zero. Data is transmitted from each port when both the binary port weight and the mask have a one in the same bit position.
Abstract:
Methods and apparatus for inter-process communication are provided. A circuit may have a plurality of clusters, and at least one cluster may have a computation element (CE), a memory operatively coupled with the CE, and an autonomic transport system (ATS) block operatively coupled with the CE and the memory. The ATS block may be configured to perform inter-process communication (IPC) for the at least one cluster. In one embodiment, the ATS block may transfer a message to a different cluster based on a request from the CE. In another embodiment, the ATS block may receive a message by allocating a buffer in the memory and write the message into the buffer. The ATS block may also be configured to manage synchronization and schedule tasks for the CE.
Abstract:
Disclosed is method for operating an interposer that includes assigning a binary port weight to a plurality of input ports of the interposer. The sum of all of the port weights is less than or equal to a number of traversals available to the interposer in a cycle. A traversal counter is set zero at the beginning of each cycle. The output of the traversal counter is a binary number of m bits. A mask is generated when a bit of the traversal counter transitions from a zero to a one. The mask is generated having the m−k+1 bit of the mask equal to one and all other bits of the mask equal to zero. Data is transmitted from each port when both the binary port weight and the mask have a one in the same bit position.
Abstract:
System and method embodiments are provided for messaging-based System-on-a-chip (SoC) power gating. The embodiments enable fine granularity SoC power gating without introducing significant latency and substantially maximizes SoC power reduction. In an embodiment, a method in a first SoC resource for messaging-based power gating includes receiving at the first SoC resource a wakeup notification message (WNM) from a second SoC resource, wherein the WNM comprises a time at which a result message from the second SoC resource is expected to arrive at the first SoC resource; determining with the first SoC resource a wake-up time according to the time at which the result message from the second SoC resource is expected to arrive at the first SoC resource; setting a wake-up time timer to expire at the wake-up time; and waking up the first SoC resource when the wake-up time timer expires when the first SoC resource is asleep.
Abstract:
Technology for providing data to a processing unit is disclosed. A computer processor may be divided into a master processing unit and consumer processing units. The master processing unit at least partially decodes a machine instruction and determines whether data is needed to execute the machine instruction. The master processing unit sends a request to memory for the data. The request may indicate that the data is to be sent from the memory to a consumer processing unit. The data sent by the memory in response to the request may be stored in local read storage that is close to the consumer processing unit for fast access. The master processing unit may also provide the machine instruction to the consumer processing unit. The consumer processing unit may access the data from the local read storage and execute the machine instruction based on the accessed data.
Abstract:
A described embodiment of the present invention includes a network having a first, second an d third plurality of routers connected to a plurality of endpoints. At least one of the first plurality of routers includes a plurality of interposers having a number of queues. The at least one of the first plurality of routers has a demultiplexer for each interposer configured to receive multiplexed data from the interposer and provide demultiplexed data on to a plurality of second queues corresponding to the first queues of the number of queues. The at least one of the first plurality of routers also includes a number multiplexers, each of the number multiplexers having inputs configured to receive data from the number of queues.
Abstract:
System and method embodiments are provided for messaging-based System-on-a-chip (SoC) power gating. The embodiments enable fine granularity SoC power gating without introducing significant latency and substantially maximizes SoC power reduction. In an embodiment, a method in a first SoC resource for messaging-based power gating includes receiving at the first SoC resource a wakeup notification message (WNM) from a second SoC resource, wherein the WNM comprises a time at which a result message from the second SoC resource is expected to arrive at the first SoC resource; determining with the first SoC resource a wake-up time according to the time at which the result message from the second SoC resource is expected to arrive at the first SoC resource; setting a wake-up time timer to expire at the wake-up time; and waking up the first SoC resource when the wake-up time timer expires when the first SoC resource is asleep.