Apparatus and method for accessing a memory device during speculative instruction branching
    81.
    发明授权
    Apparatus and method for accessing a memory device during speculative instruction branching 失效
    在推测性指令分支期间访问存储器件的装置和方法

    公开(公告)号:US06526503B1

    公开(公告)日:2003-02-25

    申请号:US09434763

    申请日:1999-11-04

    申请人: Balaram Sinharoy

    发明人: Balaram Sinharoy

    IPC分类号: G06F942

    摘要: Instruction branching circuitry including a plurality of logical stacks each having a plurality of entries for storing an address for accessing a corresponding instruction in a memory device. A counter generates a pointer to an entry in an active one of the logical stacks, the counter including incrementation logic incrementing a stored pointer value following a Push operation and decrementation logic decrementing the stored pointer value following a Pop operation to the active one of the logical stacks. Selector circuitry selects the active one of the logical stacks in accordance with the performance of the Push and Pop operations.

    摘要翻译: 指令分支电路,其包括多个逻辑堆栈,每个逻辑堆栈具有多个条目,用于存储访问存储器设备中相应指令的地址。 计数器产生指向逻辑堆栈中的活动的一个条目中的条目的指针,该计数器包括递增逻辑,其在按压操作之后增加存储的指针值,并且在弹出操作之后将存储的指针值递减到逻辑的活动的一个之后的递减逻辑 堆栈 选择器电路根据Push和Pop操作的性能选择活动的逻辑堆栈。

    Assist thread for injecting cache memory in a microprocessor
    83.
    发明授权
    Assist thread for injecting cache memory in a microprocessor 有权
    协助在微处理器中注入高速缓存的线程

    公开(公告)号:US08949837B2

    公开(公告)日:2015-02-03

    申请号:US13434423

    申请日:2012-03-29

    摘要: A data processing system includes a microprocessor having access to multiple levels of cache memories. The microprocessor executes a main thread compiled from a source code object. The system includes a processor for executing an assist thread also derived from the source code object. The assist thread includes memory reference instructions of the main thread and only those arithmetic instructions required to resolve the memory reference instructions. A scheduler configured to schedule the assist thread in conjunction with the corresponding execution thread is configured to execute the assist thread ahead of the execution thread by a determinable threshold such as the number of main processor cycles or the number of code instructions. The assist thread may execute in the main processor or in a dedicated assist processor that makes direct memory accesses to one of the lower level cache memory elements.

    摘要翻译: 数据处理系统包括具有访问多级缓存存储器的微处理器。 微处理器执行从源代码对象编译的主线程。 该系统包括用于执行也源自源代码对象的辅助线程的处理器。 辅助线程包括主线程的存储器参考指令和仅解析存储器参考指令所需的算术指令。 配置成与对应的执行线程一起调度辅助线程的调度器被配置为通过诸如主处理器周期的数量或代码指令的数量的可确定的阈值来执行执行线程之前的辅助线程。 辅助线程可以在主处理器或专用辅助处理器中执行,该处理器直接对下一级高速缓冲存储器元件之一进行存储器访问。

    Operand data structure for block computation
    87.
    发明授权
    Operand data structure for block computation 有权
    块计算的操作数数据结构

    公开(公告)号:US08407680B2

    公开(公告)日:2013-03-26

    申请号:US12336301

    申请日:2008-12-16

    IPC分类号: G06F9/45

    CPC分类号: G06F8/4441 G06F8/447

    摘要: In response to receiving pre-processed code, a compiler identifies a code section that is not a candidate for acceleration and a code block that is a candidate for acceleration. The code block specifies an iterated operation having a first operand and a second operand, where each of multiple first operands and each of multiple second operands for the iterated operation has a defined addressing relationship. In response to the identifying, the compiler generates post-processed code containing lower level instruction(s) corresponding to the identified code section and creates and outputs an operand data structure separate from the post-processed code. The operand data structure specifies the defined addressing relationship for the multiple first operands and for the multiple second operands. The compiler places a block computation command in the post-processed code that invokes processing of the operand data structure to compute operand addresses.

    摘要翻译: 响应于接收预处理的代码,编译器识别不是加速候选的代码段和作为加速候选的代码块。 代码块指定具有第一操作数和第二操作数的迭代操作,其中用于迭代操作的多个第一操作数和多个第二操作数中的每一个具有定义的寻址关系。 响应于识别,编译器生成包含对应于所识别的代码段的较低级别指令的后处理代码,并创建并输出与后处理代码分离的操作数数据结构。 操作数数据结构指定多个第一个操作数和多个第二个操作数的定义的寻址关系。 编译器在后处理代码中放置块计算命令,该代码调用操作数数据结构的处理以计算操作数地址。

    Computation table for block computation
    88.
    发明授权
    Computation table for block computation 有权
    块计算的计算表

    公开(公告)号:US08327345B2

    公开(公告)日:2012-12-04

    申请号:US12336332

    申请日:2008-12-16

    IPC分类号: G06F9/45

    CPC分类号: G06F8/4441

    摘要: In response to receiving pre-processed code, a compiler identifies a code section that is not candidate for acceleration and identifying a code block specifying an iterated operation that is a candidate for acceleration. In response to identifying the code section, the compiler generates post-processed code containing one or more lower level instructions corresponding to the identified code section, and in response to identifying the code block, the compiler creates and outputs an operation data structure separate from the post-processed code that identifies the iterated operation. The compiler places a block computation command in the post-processed code that invokes processing of the operation data structure to perform the iterated operation and outputs the post-processed code.

    摘要翻译: 响应于接收预处理的代码,编译器识别不是加速候选的代码段,并且识别指定作为加速候选的迭代操作的代码块。 响应于识别代码部分,编译器生成包含与识别的代码部分相对应的一个或多个较低级别指令的后处理代码,并且响应于识别代码块,编译器创建并输出与 标识迭代操作的后处理代码。 编译器在后处理代码中放置块计算命令,该代码调用操作数据结构的处理以执行迭代操作,并输出后处理代码。

    Asynchronous memory move across physical nodes with dual-sided communication
    89.
    发明授权
    Asynchronous memory move across physical nodes with dual-sided communication 有权
    异步存储器通过双面通信跨物理节点移动

    公开(公告)号:US08275963B2

    公开(公告)日:2012-09-25

    申请号:US12024486

    申请日:2008-02-01

    IPC分类号: G06F12/08

    摘要: A distributed data processing system includes: (1) a first node with a processor, a first memory, and asynchronous memory mover logic; and connection mechanism that connects (2) a second node having a second memory. The processor includes processing logic for completing a cross-node asynchronous memory move (AMM) operation, wherein the processor performs a move of data in virtual address space from a first effective address to a second effective address, and the asynchronous memory mover logic completes a physical move of the data from a first memory location in the first memory having a first real address to a second memory location in the second memory having a second real address. The data is transmitted via the connection mechanism connecting the two nodes independent of the processor.

    摘要翻译: 分布式数据处理系统包括:(1)具有处理器的第一节点,第一存储器和异步存储器移动器逻辑; 以及连接机构,其连接(2)具有第二存储器的第二节点。 处理器包括用于完成跨节点异步存储器移动(AMM)操作的处理逻辑,其中处理器执行将虚拟地址空间中的数据从第一有效地址移动到第二有效地址,并且异步存储器移动器逻辑完成 从具有第一实际地址的第一存储器中的第一存储器位置的数据的物理移动到具有第二实际地址的第二存储器中的第二存储器位置。 数据通过连接独立于处理器的两个节点的连接机制进行传输。

    Specifying an access hint for prefetching limited use data in a cache hierarchy
    90.
    发明授权
    Specifying an access hint for prefetching limited use data in a cache hierarchy 失效
    指定在缓存层次结构中预取有限使用数据的访问提示

    公开(公告)号:US08176254B2

    公开(公告)日:2012-05-08

    申请号:US12424681

    申请日:2009-04-16

    IPC分类号: G06F13/00

    摘要: A system and method for specifying an access hint for prefetching limited use data. A processing unit receives a data cache block touch (DCBT) instruction having an access hint indicating to the processing unit that a program executing on the data processing system may soon access a cache block addressed within the DCBT instruction. The access hint is contained in a code point stored in a subfield of the DCBT instruction. In response to detecting that the code point is set to a specific value, the data addressed in the DCBT instruction is prefetched into an entry in the lower level cache. The entry may then be updated as a least recently used entry of a plurality of entries in the lower level cache. In response to a new cache block being fetched to the cache, the prefetched cache block is cast out of the cache.

    摘要翻译: 一种用于指定预取有限使用数据的访问提示的系统和方法。 处理单元接收具有指示给处理单元的访问提示的数据高速缓存块触摸(DCBT)指令,即在数据处理系统上执行的程序可以很快访问在DCBT指令内寻址的高速缓存块。 访问提示包含在存储在DCBT指令的子字段中的代码点中。 响应于检测到代码点被设置为特定值,DCBT指令中寻址的数据被预取到低级缓存中的条目中。 然后可以将条目作为较低级别高速缓存中的多个条目的最近最少使用的条目来更新。 响应于将新的高速缓存块提取到高速缓存,预取的高速缓存块被抛出高速缓存。