摘要:
Techniques are provided which may be implemented in various methods and/or apparatuses that to provide a tasking system buffer interface capability to interface with a plurality of shared processes/engines.
摘要:
In a described implementation of early energy measurement, a wireless device adjusts a receiver gain during each current symbol time responsive to a signal energy level measured in a previous symbol time.
摘要:
An apparatus and method are described for efficiently transferring data from a core of a central processing unit (CPU) to a graphics processing unit (GPU). For example, one embodiment of a method comprises: writing data to a buffer within the core of the CPU until a designated amount of data has been written; upon detecting that the designated amount of data has been written, responsively generating an eviction cycle, the eviction cycle causing the data to be transferred from the buffer to a cache accessible by both the core and the GPU; setting an indication to indicate to the GPU that data is available in the cache; and upon the GPU detecting the indication, providing the data to the GPU from the cache upon receipt of a read signal from the GPU.
摘要:
In accordance with embodiments disclosed herein, there are provided methods, systems, mechanisms, techniques, and apparatuses for cutting senior store latency using store prefetching. For example, in one embodiment, such means may include an integrated circuit or an out of order processor means that processes out of order instructions and enforces in-order requirements for a cache. Such an integrated circuit or out of order processor means further includes means for receiving a store instruction; means for performing address generation and translation for the store instruction to calculate a physical address of the memory to be accessed by the store instruction; and means for executing a pre-fetch for a cache line based on the store instruction and the calculated physical address before the store instruction retires.
摘要:
An apparatus and method are described for efficiently transferring data from a core of a central processing unit (CPU) to a graphics processing unit (GPU). For example, one embodiment of a method comprises: writing data to a buffer within the core of the CPU until a designated amount of data has been written; upon detecting that the designated amount of data has been written, responsively generating an eviction cycle, the eviction cycle causing the data to be transferred from the buffer to a cache accessible by both the core and the GPU; setting an indication to indicate to the GPU that data is available in the cache; and upon the GPU detecting the indication, providing the data to the GPU from the cache upon receipt of a read signal from the GPU.
摘要:
Techniques are provided which may be implemented in various methods and/or apparatuses that to provide a tasking system buffer interface capability to interface with a plurality of shared processes/engines.
摘要:
In one embodiment, a heterogeneous multicore processor is described that is optimized to execute multi-stage computer vision algorithms such as cascade classifier workloads. In such embodiment the heterogeneous processor includes at least one SIMD core, such as a vector processor core, coupled with one or more scalar cores. In one embodiment the heterogeneous multiprocessor executes multi-stage compute operations, where the SIMD core computes a first set of stages and the one or more scalar cores compute the second set of stages. In one embodiment, a process for designing a heterogeneous multicore processor is disclosed which optimizes the ratio of scalar to SIMD cores based on execution time of the multi-stage compute operation in relation to processor die area consumed by a processor configuration having the ratio.
摘要:
In accordance with embodiments disclosed herein, there are provided methods, systems, mechanisms, techniques, and apparatuses for cutting senior store latency using store prefetching. For example, in one embodiment, such means may include an integrated circuit or an out of order processor means that processes out of order instructions and enforces in-order requirements for a cache. Such an integrated circuit or out of order processor means further includes means for receiving a store instruction; means for performing address generation and translation for the store instruction to calculate a physical address of the memory to be accessed by the store instruction; and means for executing a pre-fetch for a cache line based on the store instruction and the calculated physical address before the store instruction retires.
摘要:
An apparatus and method are described for efficiently transferring data from a producer core to a consumer core within a central processing unit (CPU). For example, one embodiment of a method comprises: A method for transferring a chunk of data from a producer core of a central processing unit (CPU) to consumer core of the CPU, comprising: writing data to a buffer within the producer core of the CPU until a designated amount of data has been written; upon detecting that the designated amount of data has been written, responsively generating an eviction cycle, the eviction cycle causing the data to be transferred from the fill buffer to a cache accessible by both the producer core and the consumer core; and upon the consumer core detecting that data is available in the cache, providing the data to the consumer core from the cache upon receipt of a read signal from the consumer core.
摘要:
In a described implementation of early energy measurement, a wireless device adjusts a receiver gain during each current symbol time responsive to a signal energy level measured in a previous symbol time.