Abstract:
Techniques including receiving configuration information for a trigger control channel of the one or more trigger control channels, the configuration information defining a first one or more triggering events, receiving a first memory management command, store the first memory management command, detecting a first one or more triggering events, and triggering the stored first memory management command based on the detected first one or more triggering events.
Abstract:
Techniques for maintaining cache coherency comprising storing data blocks associated with a main process in a cache line of a main cache memory, storing a first local copy of the data blocks in a first local cache memory of a first processor, storing a second local copy of the set of data blocks in a second local cache memory of a second processor executing a first child process of the main process to generate first output data, writing the first output data to the first data block of the first local copy as a write through, writing the first output data to the first data block of the main cache memory as a part of the write through, transmitting an invalidate request to the second local cache memory, marking the second local copy of the set of data blocks as delayed, and transmitting an acknowledgment to the invalidate request.
Abstract:
Systems and methods provide for efficiently and accurately determining a simplified path that conforms to the geometry of an original path by simultaneously minimizing the deviation from the original path and reducing the number of anchor points in the simplified path. A simplified path may be iteratively generated by updating parametric values and anchor points for candidate simplified paths at epochs. A deviation in distance between points on the original path and corresponding points on candidate paths may be iteratively decreased to ensure that the resulting simplified path follows the geometry of the original path to a predetermined threshold. Continuity constrains can also be applied to ensure smoothness of the simplified path.
Abstract:
A configurable processing circuit capable of handling multiple threads simultaneously, the circuit comprising a thread data store, a plurality of configurable execution units, a configurable routing network for connecting locations in the thread data store to the execution units, a configuration data store for storing configuration instances that each define a configuration of the routing network and a configuration of one or more of the plurality of execution units, and a pipeline formed from the execution units, the routing network and the thread data store that comprises a plurality of pipeline sections configured such that each thread propagates from one pipeline section to the next at each clock cycle, the circuit being configured to: (i) associate each thread with a configuration instance; and (ii) configure each of the plurality of pipeline sections for each clock cycle to be in accordance with the configuration instance associated with the respective thread that will propagate through that pipeline section during the clock cycle.
Abstract:
A method, a computing system, and a non-transitory machine readable storage medium containing instructions for managing a stream processing topology are provided. In an example, the method includes receiving a first topology that communicatively couples a plurality of processing elements via a first arrangement of interconnections to perform an operation on a stream of data. A second topology is defined that communicatively couples the plurality of processing elements via a second arrangement of interconnections that is different from the first arrangement. The second topology assigns the plurality of processing elements a first set of operations. The second topology is provided to a stream processing manager and is modified during processing of the stream of data by assigning a second set of operations to the plurality of processing elements that is different from the first set of operations.
Abstract:
A method and apparatus for providing a scalable compute fabric are provided herein. The method includes determining a workflow for processing by the scalable compute fabric, wherein the workflow is based on an instruction set. A pipeline in configured dynamically for processing the workflow, and the workflow is executed using the pipeline.
Abstract:
A mechanism is described for facilitating dynamic and efficient management of instruction atomicity violations in software programs according to one embodiment. A method of embodiments, as described herein, includes receiving, at a replay logic from a recording system, a recording of a first software thread running a first macro instruction, and a second software thread running a second macro instruction. The first software thread and the second software thread are executed by a first core and a second core, respectively, of a processor at a computing device. The recording system may record interleavings between the first and second macro instructions. The method includes correctly replaying the recording of the interleavings of the first and second macro instructions precisely as they occurred. The correctly replaying may include replaying a local memory state of the first and second macro instructions and a global memory state of the first and second software threads.
Abstract:
A method of processing data in an integrated circuit is described. The method comprises establishing a pipeline of processing blocks, wherein each processing block has a different function; coupling a data packet having data and meta-data to an input of the pipeline of processing blocks; and processing the data of the data packet using predetermined processing blocks based upon the meta-data. A device for processing data in an integrated circuit is also described.
Abstract:
A mechanism is described for facilitating dynamic and efficient management of instruction atomicity violations in software programs according to one embodiment. A method of embodiments, as described herein, includes receiving, at a replay logic from a recording system, a recording of a first software thread running a first macro instruction, and a second software thread running a second macro instruction. The first software thread and the second software thread are executed by a first core and a second core, respectively, of a processor at a computing device. The recording system may record interleavings between the first and second macro instructions. The method includes correctly replaying the recording of the interleavings of the first and second macro instructions precisely as they occurred. The correctly replaying may include replaying a local memory state of the first and second macro instructions and a global memory state of the first and second software threads.
Abstract:
A method and apparatus for providing a scalable compute fabricare provided herein. The method includes determining a workflow for processing by the scalable compute fabric, wherein the workflow is based on an instruction set. A pipeline in configured dynamically for processing the workflow, and the workflow is executed using the pipeline.