摘要:
A processor implements conditional vector operations in which, for example, an input vector containing multiple operands to be used in conditional operations is divided into two or more output vectors based on a condition vector. Each output vector can then be processed at full processor efficiency without cycles wasted due to branch latency. Data to be processed are divided into two groups based on whether or not they satisfy a given condition by e.g., steering each to one of the two index vectors. Once the data have been segregated in this way, subsequent processing can be performed without conditional operations, processor cycles wasted due to branch latency, incorrect speculation or execution of unnecessary instructions due to predication. Other examples of conditional operations include combining one or more input vectors into a single output vector based on a condition vector, conditional vector switching, conditional vector combining, and conditional vector load balancing.
摘要:
A method for communicating data between peripheral devices and an embedded processor that includes receiving, at a data buffer unit of the embedded processor, the data from a peripheral device. The method also includes copying data from the data buffer unit into the bridge buffer of the embedded processor as a bridge buffer message. Additionally, the method includes creating, after storing the data as a bridge buffer message, a peripheral device message comprising the bridge buffer message, and sending the peripheral device message to a thread message queue of a subscriber.
摘要:
A memory processing approach involves implementation of memory status-driven access. According to an example embodiment, addresses received at an address buffer are processed for access to a memory relative to an active location in the memory. Addresses corresponding to an active location in the memory array are processed prior to addresses that do not correspond to an active location. Data is read from the memory to a read buffer and ordered in a manner commensurate with the order of received addresses at the address buffer (e.g., thus facilitating access to the memory in an order different from that received at the address buffer while maintaining the order from the read buffer).
摘要:
A memory processing approach involves implementation of memory status-driven access. According to an example embodiment, addresses received at an address buffer are processed for access to a memory relative to an active location in the memory. Addresses corresponding to an active location in the memory array are processed prior to addresses that do not correspond to an active location. Data is read from the memory to a read buffer and ordered in a manner commensurate with the order of received addresses at the address buffer (e.g., thus facilitating access to the memory in an order different from that received at the address buffer while maintaining the order from the read buffer).
摘要:
A method for communicating data between peripheral devices and an embedded processor that includes receiving, at a data buffer unit of the embedded processor, the data from a peripheral device. The method also includes copying data from the data buffer unit into the bridge buffer of the embedded processor as a bridge buffer message. Additionally, the method includes creating, after storing the data as a bridge buffer message, a peripheral device message comprising the bridge buffer message, and sending the peripheral device message to a thread message queue of a subscriber.
摘要:
A system and method for context-independent coding using frequency-based mapping schemes, sequence-based mapping schemes, memory trace-based mapping schemes, and/or transition statistics-based mapping schemes in order to reduce off-chip interconnect power consumption. State-of-the-art context-dependent, double-ended codes for processor-SDRAM off-chip interfaces require the transmitter and receiver (memory controller and SDRAM) to collaborate using the current and previously transmitted values to encode and decode data. In contrast, the memory controller can use a context-independent code to encode data stored in SDRAM and subsequently decode that data when it is retrieved, allowing the use of commodity memories. A single-ended, context-independent code is realized by assigning limited-weight codes using a frequency-based mapping technique. Experimental results show that such a code can reduce the power consumption of an uncoded off-chip interconnect by an average of 30% with less than a 0.1% degradation in performance
摘要:
A processor implements conditional vector operations in which, for example, an input vector containing multiple operands to be used in conditional operations is divided into two or more output vectors based on a condition vector. Each output vector can then be processed at full processor efficiency without cycles wasted due to branch latency. Data to be processed are divided into two groups based on whether or not they satisfy a given condition by e.g., steering each to one of the two index vectors. Once the data have been segregated in this way, subsequent processing can be performed without conditional operations, processor cycles wasted due to branch latency, incorrect speculation or execution of unnecessary instructions due to predication. Other examples of conditional operations include combining one or more input vectors into a single output vector based on a condition vector, conditional vector switching, conditional vector combining, and conditional vector load balancing.
摘要:
A processor implements conditional vector operations in which, for example, an input vector containing multiple operands to be used in conditional operations is divided into two or more output vectors based on a condition vector. Each output vector can then be processed at full processor efficiency without cycles wasted due to branch latency. Data to be processed are divided into two groups based on whether or not they satisfy a given condition by, e.g., steering each to one of two index vectors. Once the data have been segregated in this way, subsequent processing can be performed without conditional operations, processor cycles wasted due to branch latency, incorrect speculation or execution of unnecessary instructions due to predication. Other examples of conditional operations include combining one or more input vectors into a single output vector based on a condition vector, conditional vector switching, conditional vector combining, and conditional vector load balancing.
摘要:
A computer readable medium comprising computer readable code for data transfer. The computer readable code, when executed, performs a method. The method includes receiving, at a first Axon, an ARP request from a source host directed to a target host. The method also includes obtaining a first route from the first Axon to the second Axon, and generating a target identification corresponding to the target host. The method further includes sending an Axon-ARP request to the second Axon using the first route, and receiving an Axon-ARP reply from the second Axon, where the Axon-ARP reply includes a second route. The method further includes storing the first route in storage space on the first Axon, where the storage space is indexed by the target identification, and sending an ARP reply to the first host where the source host is configured to send a packet to the target host.
摘要:
A system and method for context-independent coding using frequency-based mapping schemes, sequence-based mapping schemes, memory trace-based mapping schemes, and/or transition statistics-based mapping schemes in order to reduce off-chip interconnect power consumption. State-of-the-art context-dependent, double-ended codes for processor-SDRAM off-chip interfaces require the transmitter and receiver (memory controller and SDRAM) to collaborate using the current and previously transmitted values to encode and decode data. In contrast, the memory controller can use a context-independent code to encode data stored in SDRAM and subsequently decode that data when it is retrieved, allowing the use of commodity memories. A single-ended, context-independent code is realized by assigning limited-weight codes using a frequency-based mapping technique. Experimental results show that such a code can reduce the power consumption of an uncoded off-chip interconnect by an average of 30% with less than a 0.1% degradation in performance.