摘要:
Instruction branching circuitry including a plurality of logical stacks each having a plurality of entries for storing an address for accessing a corresponding instruction in a memory device. A counter generates a pointer to an entry in an active one of the logical stacks, the counter including incrementation logic incrementing a stored pointer value following a Push operation and decrementation logic decrementing the stored pointer value following a Pop operation to the active one of the logical stacks. Selector circuitry selects the active one of the logical stacks in accordance with the performance of the Push and Pop operations.
摘要:
A technique for performing cache injection includes monitoring addresses on a bus. Ownership of input/output data on the bus is acquired by a cache when an address on the bus (that is associated with the input/output data) corresponds to an address of a data block stored in the cache.
摘要:
A data processing system includes a microprocessor having access to multiple levels of cache memories. The microprocessor executes a main thread compiled from a source code object. The system includes a processor for executing an assist thread also derived from the source code object. The assist thread includes memory reference instructions of the main thread and only those arithmetic instructions required to resolve the memory reference instructions. A scheduler configured to schedule the assist thread in conjunction with the corresponding execution thread is configured to execute the assist thread ahead of the execution thread by a determinable threshold such as the number of main processor cycles or the number of code instructions. The assist thread may execute in the main processor or in a dedicated assist processor that makes direct memory accesses to one of the lower level cache memory elements.
摘要:
A processing system and method includes a predecoder configured to identify instructions that are combinable. Instruction storage is configured to merge instructions that are combinable by replacing the combinable instructions with a wide data internal instruction for execution. An instruction execution unit is configured to execute the internal instruction on a wide datapath.
摘要:
A method of data processing in a processor includes maintaining a usage history indicating demand usage of prefetched data retrieved into cache memory. An amount of data to prefetch by a data prefetch request is selected based upon the usage history. The data prefetch request is transmitted to a memory hierarchy to prefetch the selected amount of data into cache memory.
摘要:
A technique for performing cache injection includes monitoring, at a cache, addresses on a bus. Ownership of input/output data on the bus is then acquired by the cache when an address on the bus (that is associated with the input/output data) corresponds to an address of a data block stored in the cache. A replacement policy position of the data block is then modified (to increase a probability that the data block is consumed prior to ejection from the cache).
摘要:
In response to receiving pre-processed code, a compiler identifies a code section that is not a candidate for acceleration and a code block that is a candidate for acceleration. The code block specifies an iterated operation having a first operand and a second operand, where each of multiple first operands and each of multiple second operands for the iterated operation has a defined addressing relationship. In response to the identifying, the compiler generates post-processed code containing lower level instruction(s) corresponding to the identified code section and creates and outputs an operand data structure separate from the post-processed code. The operand data structure specifies the defined addressing relationship for the multiple first operands and for the multiple second operands. The compiler places a block computation command in the post-processed code that invokes processing of the operand data structure to compute operand addresses.
摘要:
In response to receiving pre-processed code, a compiler identifies a code section that is not candidate for acceleration and identifying a code block specifying an iterated operation that is a candidate for acceleration. In response to identifying the code section, the compiler generates post-processed code containing one or more lower level instructions corresponding to the identified code section, and in response to identifying the code block, the compiler creates and outputs an operation data structure separate from the post-processed code that identifies the iterated operation. The compiler places a block computation command in the post-processed code that invokes processing of the operation data structure to perform the iterated operation and outputs the post-processed code.
摘要:
A distributed data processing system includes: (1) a first node with a processor, a first memory, and asynchronous memory mover logic; and connection mechanism that connects (2) a second node having a second memory. The processor includes processing logic for completing a cross-node asynchronous memory move (AMM) operation, wherein the processor performs a move of data in virtual address space from a first effective address to a second effective address, and the asynchronous memory mover logic completes a physical move of the data from a first memory location in the first memory having a first real address to a second memory location in the second memory having a second real address. The data is transmitted via the connection mechanism connecting the two nodes independent of the processor.
摘要:
A system and method for specifying an access hint for prefetching limited use data. A processing unit receives a data cache block touch (DCBT) instruction having an access hint indicating to the processing unit that a program executing on the data processing system may soon access a cache block addressed within the DCBT instruction. The access hint is contained in a code point stored in a subfield of the DCBT instruction. In response to detecting that the code point is set to a specific value, the data addressed in the DCBT instruction is prefetched into an entry in the lower level cache. The entry may then be updated as a least recently used entry of a plurality of entries in the lower level cache. In response to a new cache block being fetched to the cache, the prefetched cache block is cast out of the cache.