Abstract:
A method includes receiving image data and performing a non-maximum suppression (NMS) operation on the image data. The method also includes initiating an edge tracking by hysteresis (ETH) operation on a portion of the image data prior to completion of the NMS operation.
Abstract:
An example method for executing multiple instructions in one or more slots includes receiving a packet including multiple instructions and executing the multiple instructions in one or more slots in a time shared manner. Each slot is associated with an execution data path or a memory data path. An example method for executing at least one instruction in a plurality of phases includes receiving a packet including an instruction, splitting the instruction into a plurality of phases, and executing the instruction in the plurality of phases.
Abstract:
A method includes reading, by a processor, one or more configuration values from a storage device or a memory management unit. The method also includes loading the one or more configuration values into one or more registers of the processor. The one or more registers are useable by the processor to perform address translation.
Abstract:
An apparatus includes a processor and a guest operating system. In response to receiving a request to create a task, the guest operating system requests a hypervisor to create a virtual processor to execute the requested task. The virtual processor is schedulable on the processor.
Abstract:
Systems and methods for implementing certain load instructions, such as vector load instructions by cooperation of a main processor and a coprocessor. The load instructions which are identified by the main processor for offloading to the coprocessor are committed in the main processor without receiving corresponding load data. Post-commit, the load instructions are processed in the coprocessor, such that latencies incurred in fetching the load data are hidden from the main processor. By implementing an out-of-order load data buffer associated with an in-order instruction buffer, the coprocessor is also configured to avoid stalls due to long latencies which may be involved in fetching the load data from levels of memory hierarchy, such as L2, L3, L4 caches, main memory, etc.
Abstract:
An apparatus includes a processor and a guest operating system. In response to receiving a request to create a task, the guest operating system requests a hypervisor to create a virtual processor to execute the requested task. The virtual processor is schedulable on the processor.
Abstract:
A method includes executing, at a processor, a dedicated arithmetic encoding instruction. The dedicated arithmetic encoding instruction accepts a plurality of inputs including a first range, a first offset, and a first state and produces one or more outputs based on the plurality of inputs. The method also includes storing a second state, realigning the first range to produce a second range, and realigning the first offset to produce a second offset based on the one or more outputs of the dedicated arithmetic encoding instruction.
Abstract:
In a particular embodiment, a method includes identifying one or more way prediction characteristics of an instruction. The method also includes selectively reading, based on identification of the one or more way prediction characteristics, a table to identify an entry of the table associated with the instruction that identifies a way of a data cache. The method further includes making a prediction whether a next access of the data cache based on the instruction will access the way.
Abstract:
Parallelization of scalar operations by vector processors using data-indexed accumulators in vector register files, related circuits, methods, and computer-readable media are disclosed. In one aspect, a vector processor comprises a vector register file providing a plurality of write ports and a plurality of vector registers each providing a plurality of accumulators. The vector processor receives an input data vector. For each of the plurality of write ports, the vector processor executes vector operation(s) for accessing an input data value of the input data vector, and determining, based on the input data value, a register index for a vector register among the plurality of vector registers, and an accumulator index for an accumulator among the plurality of accumulators of the vector register. Based on the register index, a register value is retrieved from the register index, and a scalar operation is performed based on the register value and the accumulator index.
Abstract:
An apparatus includes a translation lookaside buffer (TLB). The TLB includes at least one entry that includes an entry virtual address and an entry page size indication corresponding to an entry page. The apparatus also includes input logic configured to receive an input page size indication and an input virtual address corresponding to an input page. The apparatus further includes overlap checking logic configured to determine, based at least in part on the entry page size indication and the input page size indication, whether the input page overlaps the entry page.