Abstract:
In some embodiments, a memory initialization detection process includes detecting a read instruction of a program, where the read instruction addresses a particular memory location, and where data corresponding to the particular memory location is cached in a particular cache line of a memory cache. The memory initialization detection process further includes determining, based on metadata stored in the memory cache, that a section of the particular cache line does not store valid data of the program. The memory initialization detection process further includes obtaining validity data from the section of the particular cache line. The memory initialization detection process further includes determining, based on the validity data, whether the read instruction is authorized to proceed.
Abstract:
A processor having a streaming unit is disclosed. In one embodiment, a processor includes one or more execution units configured to execute instructions of a processor instruction set. The processor further includes a streaming unit configured to execute a first instruction of the processor instruction set, wherein executing the first instruction comprises the streaming unit loading a first data stream from a memory of a computer system responsive to execution of a first instruction. The first data stream comprises a plurality of data elements. The first instruction includes a first argument indicating a starting address of the first stream, a second argument indicating a stride between the data elements, and a third argument indicative of an ending address of the stream. The streaming unit is configured to output a second data stream corresponding to the first data stream.
Abstract:
In some embodiments, a memory initialization detection process includes detecting a read instruction of a program, where the read instruction addresses a particular memory location, and where data corresponding to the particular memory location is cached in a particular cache line of a memory cache. The memory initialization detection process further includes determining, based on metadata stored in the memory cache, that a section of the particular cache line does not store valid data of the program. The memory initialization detection process further includes obtaining validity data from the section of the particular cache line. The memory initialization detection process further includes determining, based on the validity data, whether the read instruction is authorized to proceed.
Abstract:
The disclosed embodiments relate to a computing system that facilitates performing prefetching for scatter/gather operations. During operation, the system receives a scatter/gather prefetch instruction at a processor core, wherein the scatter/gather prefetch instruction specifies a virtual base address, and a plurality of offsets. Next, the system performs a lookup in a translation-lookaside buffer (TLB) using the virtual base address to obtain a physical base address that identifies a physical page for the base address. The system then sends the physical base address and the plurality of offsets to a cache. This enables the cache to perform prefetching operations for the scatter/gather instruction by adding the physical base address to the plurality of offsets to produce a plurality of physical addresses, and then prefetching cache lines for the plurality of physical addresses into the cache.
Abstract:
A processor having a streaming unit is disclosed. In one embodiment, a processor includes a streaming unit configured to load one or more input data streams from a memory coupled to the processor. The streaming unit includes an internal network having a plurality of queues configured to store streams of data. The streaming unit further includes a plurality of operations circuits configured to perform operations on the streams of data. The streaming unit is software programmable to operatively couple two or more of the plurality of operations circuits together via one or more of the plurality of queues. The operations circuits may perform operations on multiple streams of data, resulting in corresponding output streams of data.
Abstract:
In some embodiments, a memory initialization detection process includes detecting a read instruction of a program, where the read instruction addresses a particular memory location, and where data corresponding to the particular memory location is cached in a particular cache line of a memory cache. The memory initialization detection process further includes determining, based on metadata stored in the memory cache, that a section of the particular cache line does not store valid data of the program. The memory initialization detection process further includes obtaining validity data from the section of the particular cache line. The memory initialization detection process further includes determining, based on the validity data, whether the read instruction is authorized to proceed.
Abstract:
In some embodiments, a memory initialization detection process includes detecting a read instruction of a program, where the read instruction addresses a particular memory location, and where data corresponding to the particular memory location is cached in a particular cache line of a memory cache. The memory initialization detection process further includes determining, based on metadata stored in the memory cache, that a section of the particular cache line does not store valid data of the program. The memory initialization detection process further includes obtaining validity data from the section of the particular cache line. The memory initialization detection process further includes determining, based on the validity data, whether the read instruction is authorized to proceed.
Abstract:
The disclosed embodiments relate to a computing system that facilitates performing prefetching for scatter/gather operations. During operation, the system receives a scatter/gather prefetch instruction at a processor core, wherein the scatter/gather prefetch instruction specifies a virtual base address, and a plurality of offsets. Next, the system performs a lookup in a translation-lookaside buffer (TLB) using the virtual base address to obtain a physical base address that identifies a physical page for the base address. The system then sends the physical base address and the plurality of offsets to a cache. This enables the cache to perform prefetching operations for the scatter/gather instruction by adding the physical base address to the plurality of offsets to produce a plurality of physical addresses, and then prefetching cache lines for the plurality of physical addresses into the cache.
Abstract:
Techniques for handling version information using a copy engine. In one embodiment, an apparatus comprises a copy engine configured to perform one or more operations associated with a block memory operation in response to a command. Examples of block memory operations may include copy, clear, move, and/or compress operations. In one embodiment, the copy engine is configured to handle version information associated with the block memory operation based on the command. The one or more operations may include operating on data in a cache and/or modifying entries in a memory. In one embodiment, the copy engine is configured to compare version information in the command with stored version information. The copy engine may overwrite or preserve version information based on the command. The copy engine may be a coprocessing element. The copy engine may be configured to maintain coherency with other copy engines and/or processing elements.
Abstract:
Techniques for handling version information using a copy engine. In one embodiment, an apparatus comprises a copy engine configured to perform one or more operations associated with a block memory operation in response to a command. Examples of block memory operations may include copy, clear, move, and/or compress operations. In one embodiment, the copy engine is configured to handle version information associated with the block memory operation based on the command. The one or more operations may include operating on data in a cache and/or modifying entries in a memory. In one embodiment, the copy engine is configured to compare version information in the command with stored version information. The copy engine may overwrite or preserve version information based on the command. The copy engine may be a coprocessing element. The copy engine may be configured to maintain coherency with other copy engines and/or processing elements.