摘要:
A method of loading data into register files that correspond to respective execution units within a data-parallel processor. After receiving a first set of parameters that specify a subset of data within a first memory, the first set of parameters are compared to a plurality of sets of conditions that correspond to respective patterns of data. The first set of parameters is then converted to a second set of parameters in accordance with one of the sets of conditions satisfied by the first set of parameters. A sequence of memory addresses are generated based on the second set of parameters. Data is retrieved from locations within the first memory specified by the sequence of memory addresses and loaded into register files that correspond to respective execution units within a processor.
摘要:
Disclosed herein are techniques to execute tasks with a computing device. A first task is initiated to perform an operation of the first task. A buffer construct that represents a region of memory accessible to the operation of the first task is created. A second task is initiated to perform of an operation of the second task that is configured to be timed to initiate in response to the buffer construct being communicated to the second task from the first task.
摘要:
In a single-instruction-multiple-data (SIMD) processor having multiple lanes, and local memory dedicated to each lane, a method of processing an image is disclosed. The method comprises mapping consecutive rasters of the image to consecutive lanes such that groups of consecutive rasters form image strips, and vertical stacks of strips comprise strip columns. Local memory allocates memory to the image strips. A sequence of functions is processed for execution on the SIMD processor in a pipeline implementation, such that the pipeline loops over portions of the image in multiple iterations, and intermediate data processed during the functions is stored in the local memory. Data associated with the image is traversed by first processing image strips from top to bottom in a left-most strip column, then progressing to each adjacent unprocessed strip column.
摘要:
Disclosed are methods and systems for dynamically determining data-transfer paths. The data-transfer paths are dynamically determined in response to an instruction that facilitates data transfer among execution lanes in an integrated-circuit processing device operable to execute operations in parallel. In addition, embodiments include an integrated-circuit processing device operable to execute operations in parallel, including the capability of providing confirmation information to potential source lanes, the confirmation information indicating whether the potential source lanes may send data to requested destination lanes during a data-transfer interval.
摘要:
Disclosed herein are techniques to execute tasks with a computing device. A first task is initiated to perform an operation of the first task. A buffer construct that represents a region of memory accessible to the operation of the first task is created. A second task is initiated to perform of an operation of the second task that is configured to be timed to initiate in response to the buffer construct being communicated to the second task from the first task.
摘要:
Disclosed herein are techniques to manage access to a memory using a buffer construct that includes state information associated with a region of the memory. The disclosed techniques facilitate access to the region of memory through a direct memory access operation while the state information of the buffer construct is in a first state. The state information can be transitioned to a second state in response to a first instruction. The disclosed techniques also facilitate access to the region of memory through a cache operation while the state information of the buffer construct is in the second state is disclosed. The state information can be transitioned to the first state in response to a second instruction.
摘要:
Disclosed herein are techniques to execute tasks with a computing device. A first task is initiated to perform an operation of the first task. A buffer construct that represents a region of memory accessible to the operation of the first task is created. A second task is initiated to perform of an operation of the second task that is configured to be timed to initiate in response to the buffer construct being communicated to the second task from the first task.
摘要:
A method of estimating motion is disclosed. A first plurality of candidates is identified in a reference frame, wherein the total area occupied by the first plurality of candidates is substantially smaller than that of the reference frame. A first refinement search is then performed based, at least in part, on the first plurality of candidates. One or more best candidates are then identified based, at least in part, on the first refinement search. Finally, motion data is encoded based, at least in part, on the one or more best candidates.
摘要:
A computer system is provided that utilizes a buffer construct to manage memory access operations to a region of memory. The buffer construct may correspond to a data item or structure that represents a region of memory. Each task may control the buffer construct exclusively of other tasks, so that the region of memory that is represented by the buffer construct is only available to the controlling task. Another task that requires access to the region of memory must wait until the controlling task makes the buffer construct available. The controlling task makes the buffer construct available only when DMA or other memory access operations that are in progress become complete. In this way, the buffer construct acts as a token that synchronizes each of the concurrent tasks execution and ensures mutually exclusive access to the common region of memory.
摘要:
A method of operation within an integrated-circuit processing device having a plurality of execution lanes. Upon receiving an instruction to exchange data between the execution lanes, respective requests from the execution lanes are examined to determine a set of the execution lanes that may send data to one or more others of the execution lanes during a first interval. Each execution lane within the set of the execution lanes is signaled to indicate that the execution lane may send data to the one or others of the execution lanes.