摘要:
Address translation performance within a processor is improved by identifying an address that causes a boundary crossing between different pages in memory and linking address translation information associated with both memory pages. According to one embodiment of a processor, the processor comprises circuitry configured to recognize an access to a memory region crossing a page boundary between first and second memory pages. The circuitry is also configured to link address translation information associated with the first and second memory pages. Thus, responsive to a subsequent access the same memory region, the address translation information associated with the first and second memory pages is retrievable based on a single address translation.
摘要:
A processor having a multistage pipeline includes a TLB and a TLB controller. In response to a TLB miss signal, the TLB controller initiates a TLB reload, requesting address translation information from either a memory or a higher-level TLB, and placing that information into the TLB. The processor flushes the instruction having the missing virtual address, and refetches the instruction, resulting in re-insertion of the instruction at an initial stage of the pipeline above the TLB access point. The initiation of the TLB reload, and the flush/refetch of the instruction, are performed substantially in parallel, and without immediately stalling the pipeline. The refetched instruction is held at a point in the pipeline above the TLB access point until the TLB reload is complete, so that the refetched instruction generates a “hit” in the TLB upon its next access.
摘要:
Address translation performance within a processor is improved by identifying an address that causes a boundary crossing between different pages in memory and linking address translation information associated with both memory pages. According to one embodiment of a processor, the processor comprises circuitry configured to recognize an access to a memory region crossing a page boundary between first and second memory pages. The circuitry is also configured to link address translation information associated with the first and second memory pages. Thus, responsive to a subsequent access the same memory region, the address translation information associated with the first and second memory pages is retrievable based on a single address translation.
摘要:
A method of designing a system on a chip (SoC) to operate with varying latencies and frequencies. A layout of the chip is designed with specific placement of devices, including a bus controller, initiator, and target devices. The time for a signal to propagate from a source device to a destination device is determined relative to a default propagation time. A pipeline stage is then inserted into a bus path between said source device and destination device for each additional time the signal takes to propagate. Each device (i.e., initiators, targets, and bus controller) is designed with logic to control a protocol that functions with a variety of response latencies. With the additional logic, the devices do not need to be changed when pipeline stages are inserted in the various paths. Registers are utilized as the pipeline stages that are inserted within the paths.
摘要:
A method and system for reducing power in a snooping cache based environment. A memory may be coupled to a plurality of processing units via a bus. Each processing unit may comprise a cache controller coupled to a cache associated with the processing unit. The cache controller may comprise a segment register comprising N bits where each bit in the segment register may be associated with a segment of memory divided into N segments. The cache controller may be configured to snoop a requested address on the bus. Upon determining which bit in the segment register is associated with the snooped requested address, the segment register may determine if the bit associated with the snooped requested address is set. If the bit is not set, then a cache search may not be performed thereby mitigating the power consumption associated with a snooped request cache search.
摘要:
A processor includes a conditional branch instruction prediction mechanism that generates weighted branch prediction values. For weakly weighted predictions, which tend to be less accurate than strongly weighted predictions, the power associating with speculatively filling and subsequently flushing the cache is saved by halting instruction prefetching. Instruction fetching continues when the branch condition is evaluated in the pipeline and the actual next address is known. Alternatively, prefetching may continue out of a cache. To avoid displacing good cache data with instructions prefetched based on a mispredicted branch, prefetching may be halted in response to a weakly weighted prediction in the event of a cache miss.
摘要:
A method of managing cache partitions provides a first pointer for higher priority writes and a second pointer for lower priority writes, and uses the first pointer to delimit the lower priority writes. For example, locked writes have greater priority than unlocked writes, and a first pointer may be used for locked writes, and a second pointer may be used for unlocked writes. The first pointer is advanced responsive to making locked writes, and its advancement thus defines a locked region and an unlocked region. The second pointer is advanced responsive to making unlocked writes. The second pointer also is advanced (or retreated) as needed to prevent it from pointing to locations already traversed by the first pointer. Thus, the pointer delimits the unlocked region and allows the locked region to grow at the expense of the unlocked region.
摘要:
A processor includes a conditional branch instruction prediction mechanism that generates weighted branch prediction values. For weakly weighted predictions, which tend to be less accurate than strongly weighted predictions, the power associating with speculatively filling and subsequently flushing the cache is saved by halting instruction prefetching. Instruction fetching continues when the branch condition is evaluated in the pipeline and the actual next address is known. Alternatively, prefetching may continue out of a cache. To avoid displacing good cache data with instructions prefetched based on a mispredicted branch, prefetching may be halted in response to a weakly weighted prediction in the event of a cache miss.
摘要:
Data from a source domain operating at a first data rate is transferred to a FIFO in another domain operating at a different data rate. The FIFO buffers data before transfer to a sink for further processing or storage. A source side counter tracks space available in the FIFO. In disclosed examples, the initial counter value corresponds to FIFO depth. The counter decrements in response to a data ready signal from the source domain, without delay. The counter increments in response to signaling from the sink domain of a read of data off the FIFO. Hence, incrementing is subject to the signaling latency between domains. The source may send one more beat of data when the counter indicates the FIFO is full. The last beat of data is continuously sent from the source until it is indicated that a FIFO position became available; effectively providing one more FIFO position.
摘要:
Data from a source domain operating at a first data rate is transferred to a FIFO in another domain operating at a different data rate. The FIFO buffers data before transfer to a sink for further processing or storage. A source side counter tracks space available in the FIFO. In disclosed examples, the initial counter value corresponds to FIFO depth. The counter decrements in response to a data ready signal from the source domain, without delay. The counter increments in response to signaling from the sink domain of a read of data off the FIFO. Hence, incrementing is subject to the signaling latency between domains. The source may send one more beat of data when the counter indicates the FIFO is full. The last beat of data is continuously sent from the source until it is indicated that a FIFO position became available; effectively providing one more FIFO position.