Techniques for multi-level indirect data prefetching
    61.
    发明授权
    Techniques for multi-level indirect data prefetching 有权
    多级间接数据预取技术

    公开(公告)号:US08161265B2

    公开(公告)日:2012-04-17

    申请号:US12024260

    申请日:2008-02-01

    IPC分类号: G06F13/00

    摘要: A technique for performing data prefetching using multi-level indirect data prefetching includes determining a first memory address of a pointer associated with a data prefetch instruction. Content that is included in a first data block (e.g., a first cache line of a memory) at the first memory address is then fetched. A second memory address is then determined based on the content at the first memory address. Content that is included in a second data block (e.g., a second cache line) at the second memory address is then fetched (e.g., from the memory or another memory). A third memory address is then determined based on the content at the second memory address. Finally, a third data block (e.g., a third cache line) that includes another pointer or data at the third memory address is fetched (e.g., from the memory or the another memory).

    摘要翻译: 使用多级间接数据预取来执行数据预取的技术包括确定与数据预取指令相关联的指针的第一存储器地址。 然后取出包含在第一存储器地址的第一数据块(例如,存储器的第一高速缓存行)中的内容。 然后基于第一存储器地址处的内容来确定第二存储器地址。 包含在第二存储器地址的第二数据块(例如,第二高速缓存行)中的内容然后被取出(例如,从存储器或另一个存储器)。 然后基于第二存储器地址处的内容来确定第三存储器地址。 最后,取出(例如,从存储器或另一个存储器)中包含第三存储器地址处的另一指针或数据的第三数据块(例如,第三高速缓存行)。

    Techniques for indirect data prefetching
    62.
    发明授权
    Techniques for indirect data prefetching 有权
    间接数据预取技术

    公开(公告)号:US08161263B2

    公开(公告)日:2012-04-17

    申请号:US12024239

    申请日:2008-02-01

    IPC分类号: G06F13/00

    CPC分类号: G06F12/0862 G06F2212/6028

    摘要: A processor includes a first address translation engine, a second address translation engine, and a prefetch engine. The first address translation engine is configured to determine a first memory address of a pointer associated with a data prefetch instruction. The prefetch engine is coupled to the first translation engine and is configured to fetch content, included in a first data block (e.g., a first cache line) of a memory, at the first memory address. The second address translation engine is coupled to the prefetch engine and is configured to determine a second memory address based on the content of the memory at the first memory address. The prefetch engine is also configured to fetch (e.g., from the memory or another memory) a second data block (e.g., a second cache line) that includes data at the second memory address.

    摘要翻译: 处理器包括第一地址转换引擎,第二地址转换引擎和预取引擎。 第一地址转换引擎被配置为确定与数据预取指令相关联的指针的第一存储器地址。 预取引擎被耦合到第一翻译引擎,并被配置为在第一存储器地址处提取包含在存储器的第一数据块(例如,第一高速缓存行)中的内容。 第二地址转换引擎耦合到预取引擎,并且被配置为基于第一存储器地址处的存储器的内容来确定第二存储器地址。 预取引擎还被配置为从第二存储器地址提取包括数据的第二数据块(例如,第二高速缓存行)(例如,从存储器或另一存储器)。

    Fully asynchronous memory mover
    63.
    发明授权
    Fully asynchronous memory mover 失效
    全异步内存移动器

    公开(公告)号:US08095758B2

    公开(公告)日:2012-01-10

    申请号:US12024613

    申请日:2008-02-01

    IPC分类号: G06F12/02 G06F12/04

    摘要: A data processing system has a processor and a memory coupled to the processor and an asynchronous memory mover coupled to the processor. The asynchronous memory mover has registers for receiving a set of parameters from the processor, which parameters are associated with an asynchronous memory move (AMM) operation initiated by the processor in virtual address space, utilizing a source effective address and a destination effective address. The asynchronous memory mover performs the AMM operation to move the data from a first physical memory location having a source real address corresponding to the source effective address to a second physical memory location having a destination real address corresponding to the destination effective address. The asynchronous memory mover has an associated off-chip translation mechanism. The AMM operation thus occurs independent of the processor, and the processor continues processing other operations independent of the AMM operation.

    摘要翻译: 数据处理系统具有耦合到处理器的处理器和存储器以及耦合到处理器的异步存储器移动器。 异步存储器移动器具有用于从处理器接收一组参数的寄存器,这些参数与虚拟地址空间中由处理器发起的异步存储器移动(AMM)操作相关联,利用源有效地址和目的地有效地址。 异步存储器移动器执行AMM操作以将来自具有与源有效地址相对应的源实际地址的第一物理存储器位置的数据移动到具有与目的地有效地址相对应的目的地实际地址的第二物理存储器位置。 异步存储器移动器具有相关的片外转换机制。 因此,AMM操作独立于处理器,并且处理器继续处理独立于AMM操作的其他操作。

    Hardware Assist Thread for Increasing Code Parallelism
    64.
    发明申请
    Hardware Assist Thread for Increasing Code Parallelism 有权
    硬件辅助线程增加代码并行性

    公开(公告)号:US20110283095A1

    公开(公告)日:2011-11-17

    申请号:US12778192

    申请日:2010-05-12

    IPC分类号: G06F9/30 G06F9/38

    摘要: Mechanisms are provided for offloading a workload from a main thread to an assist thread. The mechanisms receive, in a fetch unit of a processor of the data processing system, a branch-to-assist-thread instruction of a main thread. The branch-to-assist-thread instruction informs hardware of the processor to look for an already spawned idle thread to be used as an assist thread. Hardware implemented pervasive thread control logic determines if one or more already spawned idle threads are available for use as an assist thread. The hardware implemented pervasive thread control logic selects an idle thread from the one or more already spawned idle threads if it is determined that one or more already spawned idle threads are available for use as an assist thread, to thereby provide the assist thread. In addition, the hardware implemented pervasive thread control logic offloads a portion of a workload of the main thread to the assist thread.

    摘要翻译: 提供了将工作负载从主线程卸载到辅助线程的机制。 机构在数据处理系统的处理器的提取单元中接收主线程的分支到辅助线程指令。 分支到辅助线程指令通知处理器的硬件来查找已经产生的空闲线程以用作辅助线程。 硬件实现的普遍线程控制逻辑确定一个或多个已经产生的空闲线程是否可用作辅助线程。 如果确定一个或多个已经产生的空闲线程可用作辅助线程,则实现的普遍线程控制逻辑的硬件从一个或多个已经产生的空闲线程中选择空闲线程,从而提供辅助线程。 此外,实现的普遍线程控制逻辑的硬件将主线程的一部分工作量卸载到辅助线程。

    CACHE RECONFIGURATION BASED ON RUN-TIME PERFORMANCE DATA OR SOFTWARE HINT
    65.
    发明申请
    CACHE RECONFIGURATION BASED ON RUN-TIME PERFORMANCE DATA OR SOFTWARE HINT 有权
    基于运行时性能数据或软件提示的缓存重新配置

    公开(公告)号:US20110107032A1

    公开(公告)日:2011-05-05

    申请号:US12985726

    申请日:2011-01-06

    IPC分类号: G06F12/08

    摘要: A method for reconfiguring a cache memory is provided. The method in one aspect may include analyzing one or more characteristics of an execution entity accessing a cache memory and reconfiguring the cache based on the one or more characteristics analyzed. Examples of analyzed characteristic may include but are not limited to data structure used by the execution entity, expected reference pattern of the execution entity, type of an execution entity, heat and power consumption of an execution entity, etc. Examples of cache attributes that may be reconfigured may include but are not limited to associativity of the cache memory, amount of the cache memory available to store data, coherence granularity of the cache memory, line size of the cache memory, etc.

    摘要翻译: 提供了一种重新配置高速缓冲存储器的方法。 一个方面中的方法可以包括分析访问高速缓冲存储器的执行实体的一个或多个特征,并且基于所分析的一个或多个特征重新配置高速缓存。 分析特性的示例可以包括但不限于执行实体使用的数据结构,执行实体的预期参考模式,执行实体的类型,执行实体的热和功耗。等等 重新配置可以包括但不限于高速缓冲存储器的相关性,可用于存储数据的高速缓冲存储器的量,高速缓冲存储器的相干粒度,高速缓存存储器的行大小等。

    Validity of address ranges used in semi-synchronous memory copy operations
    67.
    发明授权
    Validity of address ranges used in semi-synchronous memory copy operations 有权
    在半同步存储器复制操作中使用的地址范围的有效性

    公开(公告)号:US07882321B2

    公开(公告)日:2011-02-01

    申请号:US12402904

    申请日:2009-03-12

    IPC分类号: G06F12/02

    摘要: A system, method, and a computer readable for protecting content of a memory page are disclosed. The method includes determining a start of a semi-synchronous memory copy operation. A range of addresses is determined where the semi-synchronous memory copy operation is being performed. An issued instruction that removes a page table entry is detected. The method further includes determining whether the issued instruction is destined to remove a page table entry associated with at least one address in the range of addresses. In response to the issued instruction being destined to remove the page table entry, the execution of the issued instruction is stalled until the semi-synchronous memory copy operation is completed.

    摘要翻译: 公开了一种用于保护存储器页面的内容的系统,方法和可读取的计算机。 该方法包括确定半同步存储器复制操作的开始。 确定正在执行半同步存储器复制操作的地址范围。 检测到发出的删除页表条目的指令。 所述方法还包括确定所发出的指令是否旨在去除与地址范围中的至少一个地址相关联的页表条目。 响应于发出的指令旨在去除页表条目,所发出的指令的执行停止,直到半同步存储器复制操作完成。

    Thread Partitioning in a Multi-Core Environment
    69.
    发明申请
    Thread Partitioning in a Multi-Core Environment 有权
    多核环境中的线程分区

    公开(公告)号:US20100299496A1

    公开(公告)日:2010-11-25

    申请号:US12024211

    申请日:2008-02-01

    IPC分类号: G06F9/30 G06F15/76

    CPC分类号: G06F9/4843 G06F9/3851

    摘要: A set of helper thread binaries is created to retrieve data used by a set of main thread binaries. The set of helper thread binaries and the set of main thread binaries are partitioned according to common instruction boundaries. As a first partition in the set of main thread binaries executes within a first core, a second partition in the set of helper thread binaries executes within a second core, thus “warming up” the cache in the second core. When the first partition of the main completes execution, a second partition of the main core moves to the second core, and executes using the warmed up cache in the second core.

    摘要翻译: 创建一组辅助线程二进制文件来检索一组主线程二进制文件使用的数据。 辅助线程二进制文件集和主线程二进制文件集合根据公共指令边界进行分区。 作为主线程二进制文件集合中的第一分区在第一核心内执行,该辅助线程二进制文件集中的第二分区在第二核心内执行,从而“预热”第二核心中的高速缓存。 当主要的第一分区完成执行时,主核心的第二分区移动到第二核心,并使用第二核心中的预热高速缓存执行。

    Completion Arbitration for More than Two Threads Based on Resource Limitations
    70.
    发明申请
    Completion Arbitration for More than Two Threads Based on Resource Limitations 有权
    基于资源限制的两个以上线程的完成仲裁

    公开(公告)号:US20100262967A1

    公开(公告)日:2010-10-14

    申请号:US12423561

    申请日:2009-04-14

    IPC分类号: G06F9/46

    CPC分类号: G06F9/485

    摘要: A mechanism is provided for thread completion arbitration. The mechanism comprises executing more than two threads of instructions simultaneously in the processor, selecting a first thread from a first subset of threads, in the more than two threads, for completion of execution within the processor, and selecting a second thread from a second subset of threads, in the more than two threads, for completion of execution within the processor. The mechanism further comprises completing execution of the first and second threads by committing results of the execution of the first and second threads to a storage device associated with the processor. At least one of the first subset of threads or the second subset of threads comprise two or more threads from the more than two threads. The first subset of threads and second subset of threads have different threads from one another.

    摘要翻译: 提供线程完成仲裁的机制。 该机制包括在处理器中同时执行多于两个指令的线程,在多于两个线程中从线程的第一子集中选择第一线程,以完成处理器内的执行,以及从第二子集中选择第二线程 的线程,在两个以上的线程中,用于完成处理器内的执行。 该机制还包括通过将执行第一和第二线程的结果提交到与处理器相关联的存储设备来完成第一和第二线程的执行。 线程的第一子集或线程的第二子集中的至少一个包括来自多于两个线程的两个或多个线程。 线程的第一个子集和线程的第二个子集具有彼此不同的线程。