摘要:
Finite impulse response filtering is achieved by broadcasting to at least one compute unit an instruction having a plurality of data samples, a conditional field associated with each compute unit, and a set of operator values for operating on each data sample; providing a function of each the data sample in accordance with an associated set of operator values identified by the conditional field; and combining the functions to obtain an intermediate finite impulse response of the data samples.
摘要:
Finite impulse response filtering is achieved by broadcasting to at least one compute unit an instruction having a plurality of data samples, a conditional field associated with each compute unit, and a set of operator values for operating on each data sample; providing a function of each the data sample in accordance with an associated set of operator values identified by the conditional field; and combining the functions to obtain an intermediate finite impulse response of the data samples.
摘要:
Reducing pipeline stall between a compute unit and address unit in a processor can be accomplished by computing results in a compute unit in response to instructions of an algorithm; storing in a local random access memory array in a compute unit predetermined sets of functions, related to the computed results for predetermined sets of instructions of the algorithm; and providing within the compute unit direct mapping of computed results to related function.
摘要:
Reducing pipeline stall between a compute unit and address unit in a processor can be accomplished by computing results in a compute unit in response to instructions of an algorithm; storing in a local random access memory array in a compute unit predetermined sets of functions, related to the computed results for predetermined sets of instructions of the algorithm; and providing within the compute unit direct mapping of computed results to related function.
摘要:
Reducing pipeline stall between a compute unit and address unit in a processor can be accomplished by computing results in a compute unit in response to instructions of an algorithm; storing in a local random access memory array in a compute unit predetermined sets of functions, related to the computed results for predetermined sets of instructions of the algorithm; and providing within the compute unit direct mapping of computed results to related function.
摘要:
Lookup table addressing of a set of lookup tables in an external memory is achieved by: transferring a data word from a compute unit to an input register in a data address generator; providing in at least one deposit-increment index register in the data address generator including a table base field for identifying the location of the set of tables in memory, and a displacement field; and depositing a section of the data word into a displacement field in the index register for identifying the location of a specific entry in the tables.
摘要:
A compute system for executing an h.264 binary decode symbol instruction including a first compute unit having a range normalization circuit and an rLPS update circuit, and operating in a first mode responsive to current rLPS, range, value and current context to generate the next normalized range and next rLPS for the current context; a second compute unit including a value update circuit, a context update circuit, and value normalization circuit responsive to current rLPS, range value and current context to obtain the output bit, normalized value and the updated current context; and a third compute unit or said first compute unit operating in a second mode including a range circuit and a next context rLPS circuit responsive to rLPS range, value and next context to obtain a next context rLPS value.
摘要:
A compute system for executing an h.264 binary decode symbol instruction including a first compute unit having a range normalization circuit and an rLPS update circuit, and operating in a first mode responsive to current rLPS, range, value and current context to generate the next normalized range and next rLPS for the current context; a second compute unit including a value update circuit, a context update circuit, and value normalization circuit responsive to current rLPS, range value and current context to obtain the output bit, normalized value and the updated current context; and a third compute unit or said first compute unit operating in a second mode including a range circuit and a next context rLPS circuit responsive to rLPS range, value and next context to obtain a next context rLPS value.
摘要:
In a pipeline machine where, in an iterative process, one or more subsequent functions employ one or more parameters determined by one or more antecedent functions and the one or more subsequent functions generate one or more parameters for the one or more antecedent functions, pipeline dependency is reduced by advancing or rotating the iterative process by preliminarily providing to the subsequent function the next one or more parameters on which it is dependent and thereafter: generating by the subsequent function, in response to the one or more parameters on which is it dependent, the next one or more parameters required by the one or more antecedent functions and then, generating by the one or more antecedent functions, in response to the one or more parameters required by the one or more antecedent functions, the next one or more parameters for input to the subsequent function for the next iteration.
摘要:
Methods and apparatus are provided for performing a jump operation in a pipelined digital processor. The method includes writing target addresses of jump instructions to be executed to a memory table, detecting a first jump instruction being executed by the processor, the first jump instruction referencing a pointer to a first target address in the memory table, the processor executing the first jump instruction by jumping to the first target address and modifying the pointer to point to a second target address in the memory table, the second target address corresponding to a second jump instruction. The execution of the first jump instruction may include prefetching at least one future target address from the memory table and writing the future target address in a local memory. The second target address may be accessed in the local memory in response to detection of the second jump instruction.