RETURN ADDRESS PREDICTION IN MULTITHREADED PROCESSORS
    1.
    发明申请
    RETURN ADDRESS PREDICTION IN MULTITHREADED PROCESSORS 有权
    多处理器中的返回地址预测

    公开(公告)号:US20120233442A1

    公开(公告)日:2012-09-13

    申请号:US13046273

    申请日:2011-03-11

    IPC分类号: G06F9/38

    摘要: Techniques and structures are disclosed relating to predicting return addresses in multithreaded processors. In one embodiment, a processor is disclosed that includes a return address prediction unit. The return address prediction unit is configured to store return addresses for different ones of a plurality of threads executable on the processor. The return address prediction unit is configured to receive a request for a predicted return address for one of the plurality of threads. The first request includes an identification of the requesting thread. The return address prediction unit is configured to provide the predicted return address to the requesting thread. In some embodiments, the return address prediction unit is configured to store the return addresses in a memory that has a plurality of dedicated portions. In some embodiments, the return address prediction unit is configured to store the return addresses in a memory that has dynamically allocable entries.

    摘要翻译: 公开了关于预测多线程处理器中的返回地址的技术和结构。 在一个实施例中,公开了一种包括返回地址预测单元的处理器。 返回地址预测单元被配置为存储可在处理器上执行的多个线程中的不同的线程的返回地址。 返回地址预测单元被配置为接收对于多个线程之一的预测返回地址的请求。 第一个请求包括请求线程的标识。 返回地址预测单元被配置为向所请求的线程提供预测的返回地址。 在一些实施例中,返回地址预测单元被配置为将返回地址存储在具有多个专用部分的存储器中。 在一些实施例中,返回地址预测单元被配置为将返回地址存储在具有动态可分配条目的存储器中。

    Return address prediction in multithreaded processors
    2.
    发明授权
    Return address prediction in multithreaded processors 有权
    在多线程处理器中返回地址预测

    公开(公告)号:US09213551B2

    公开(公告)日:2015-12-15

    申请号:US13046273

    申请日:2011-03-11

    IPC分类号: G06F9/42 G06F9/38 G06F9/30

    摘要: Techniques and structures are disclosed relating to predicting return addresses in multithreaded processors. In one embodiment, a processor is disclosed that includes a return address prediction unit. The return address prediction unit is configured to store return addresses for different ones of a plurality of threads executable on the processor. The return address prediction unit is configured to receive a request for a predicted return address for one of the plurality of threads. The first request includes an identification of the requesting thread. The return address prediction unit is configured to provide the predicted return address to the requesting thread. In some embodiments, the return address prediction unit is configured to store the return addresses in a memory that has a plurality of dedicated portions. In some embodiments, the return address prediction unit is configured to store the return addresses in a memory that has dynamically allocable entries.

    摘要翻译: 公开了关于预测多线程处理器中的返回地址的技术和结构。 在一个实施例中,公开了一种包括返回地址预测单元的处理器。 返回地址预测单元被配置为存储可在处理器上执行的多个线程中的不同的线程的返回地址。 返回地址预测单元被配置为接收对于多个线程之一的预测返回地址的请求。 第一个请求包括请求线程的标识。 返回地址预测单元被配置为向所请求的线程提供预测的返回地址。 在一些实施例中,返回地址预测单元被配置为将返回地址存储在具有多个专用部分的存储器中。 在一些实施例中,返回地址预测单元被配置为将返回地址存储在具有动态可分配条目的存储器中。

    Minimizing TLB comparison size
    3.
    发明授权
    Minimizing TLB comparison size 有权
    最小化TLB比较大小

    公开(公告)号:US07937556B2

    公开(公告)日:2011-05-03

    申请号:US12112150

    申请日:2008-04-30

    IPC分类号: G06F12/10

    摘要: In one embodiment, a system comprises one or more registers configured to store a plurality of values that identify a virtual address space (collectively a tag), a translation lookaside buffer (TLB), and a control unit coupled to the TLB and the one or more registers. The control unit is configured to detect whether or not the tag has changed and in response to a change in the tag, map the changed tag to an identifier having fewer bits than the total number of bits in the tag, and provide the current identifier to the TLB. The TLB is configured to detect a hit/miss in response to the identifier. A similar method is also contemplated.

    摘要翻译: 在一个实施例中,系统包括一个或多个寄存器,其被配置为存储识别虚拟地址空间(统称为标签)的多个值,翻译后备缓冲器(TLB)以及耦合到TLB的控制单元, 更多的寄存器。 控制单元被配置为检测标签是否已经改变并且响应于标签的变化,将改变的标签映射到具有比标签中的总位数少的位的标识符,并将当前标识符提供给 TLB。 TLB被配置为响应于标识符来检测命中/未命中。 也可以考虑类似的方法。

    Minimizing TLB Comparison Size
    4.
    发明申请
    Minimizing TLB Comparison Size 有权
    最小化TLB比较尺寸

    公开(公告)号:US20090327646A1

    公开(公告)日:2009-12-31

    申请号:US12112150

    申请日:2008-04-30

    IPC分类号: G06F12/10 G06F12/00

    摘要: In one embodiment, a system comprises one or more registers configured to store a plurality of values that identify a virtual address space (collectively a tag), a translation lookaside buffer (TLB), and a control unit coupled to the TLB and the one or more registers. The control unit is configured to detect whether or not the tag has changed and in response to a change in the tag, map the changed tag to an identifier having fewer bits than the total number of bits in the tag, and provide the current identifier to the TLB. The TLB is configured to detect a hit/miss in response to the identifier. A similar method is also contemplated.

    摘要翻译: 在一个实施例中,系统包括被配置为存储识别虚拟地址空间(统称为标签)的多个值,翻译后备缓冲器(TLB)以及耦合到TLB的控制单元的一个或多个寄存器, 更多的寄存器。 控制单元被配置为检测标签是否已经改变并且响应于标签的变化,将改变的标签映射到具有比标签中的总位数少的位的标识符,并将当前标识符提供给 TLB。 TLB被配置为响应于标识符来检测命中/未命中。 也可以考虑类似的方法。

    Multithreaded processor having a source processor core to subsequently delay continued processing of demap operation until responses are received from each of remaining processor cores
    5.
    发明授权
    Multithreaded processor having a source processor core to subsequently delay continued processing of demap operation until responses are received from each of remaining processor cores 有权
    具有源处理器核心的多线程处理器随后延迟解映射操作的持续处理,直到从每个剩余处理器核心接收到响应

    公开(公告)号:US07454590B2

    公开(公告)日:2008-11-18

    申请号:US11222614

    申请日:2005-09-09

    IPC分类号: G06F12/08

    摘要: In one embodiment, a processor comprises a plurality of processor cores and an interconnect to which the plurality of processor cores are coupled. Each of the plurality of processor cores comprises at least one translation lookaside buffer (TLB). A first processor core is configured to broadcast a demap command on the interconnect responsive to executing a demap operation. The demap command identifies one or more translations to be invalidated in the TLBs, and remaining processor cores are configured to invalidate the translations in the respective TLBs. The remaining processor cores transmit a response to the first processor core, and the first processor core is configured to delay continued processing subsequent to the demap operation until the responses are received from each of the remaining processor cores.

    摘要翻译: 在一个实施例中,处理器包括多个处理器核和多个处理器核耦合到的互连。 多个处理器核心中的每一个包括至少一个平移后备缓冲器(TLB)。 第一处理器核心被配置为响应于执行解映射操作而在互连上广播解映射命令。 解映射命令标识在TLB中将被无效的一个或多个翻译,并且剩余的处理器核被配置为使相应TLB中的翻译无效。 剩余的处理器核心向第一处理器核心发送响应,并且第一处理器核心被配置为延迟解映射操作之后的持续处理,直到从每个其余处理器核心接收到响应。

    Hardware demapping of TLBs shared by multiple threads
    6.
    发明授权
    Hardware demapping of TLBs shared by multiple threads 有权
    由多个线程共享的TLB的硬件解映射

    公开(公告)号:US07383415B2

    公开(公告)日:2008-06-03

    申请号:US11222577

    申请日:2005-09-09

    IPC分类号: G06F12/08

    摘要: In one embodiment, a processor comprising at least one translation lookaside buffer (TLB) and a control unit coupled to the TLB. The control unit is configured to track whether or not at least one update to the TLB is pending for at least one of a plurality of strands. Each strand comprises hardware to support a different thread of a plurality of concurrently activateable threads in the processor. The strands share the TLB, and the control unit is configured to delay a demap operation issued from one of the estrands responsive to the pending update, if any.

    摘要翻译: 在一个实施例中,处理器包括至少一个翻译后备缓冲器(TLB)和耦合到该TLB的控制单元。 控制单元被配置为跟踪针对多个线段中的至少一个的至少一个对TLB的更新是否待决。 每条链包括用于支持处理器中多个可同时激活的线程的不同线程的硬件。 链路共享TLB,并且控制单元被配置为响应于待决更新(如果有的话)延迟从一个estrand发出的解映射操作。

    LOW-LATENCY BRANCH TARGET CACHE
    7.
    发明申请
    LOW-LATENCY BRANCH TARGET CACHE 审中-公开
    低阶分支目标缓存

    公开(公告)号:US20120290821A1

    公开(公告)日:2012-11-15

    申请号:US13105606

    申请日:2011-05-11

    IPC分类号: G06F9/38

    摘要: Techniques and structures are disclosed relating to a branch target cache (BTC) in a processor. In one embodiment, the BTC is usable to predict whether a control transfer instruction is to be taken, and, if applicable, a target address for the instruction. The BTC may operate in conjunction with a delayed branch predictor (DBP) that is more accurate but slower than the BTC. If the BTC indicates that a control transfer instruction is predicted to be taken, the processor begins to fetch instructions at the target address indicated by the BTC, but may discard those instructions if the DBP subsequently determines that the control transfer instruction was predicted incorrectly. Branch prediction information output from the BTC and the DBP may be used to update the branch target cache for subsequent predictions. In various embodiments, the BTC may simultaneously store entries for multiple processor threads, and may be fully associative.

    摘要翻译: 公开了涉及处理器中的分支目标高速缓存(BTC)的技术和结构。 在一个实施例中,BTC可用于预测是否要执行控制传送指令,并且如果适用,则用于指令的目标地址。 BTC可以与延迟的分支预测器(DBP)一起运行,该预测器比BTC更准确但更慢。 如果BTC指示预测要执行控制传输指令,则处理器开始在BTC指示的目标地址处获取指令,但是如果DBP随后确定控制传输指令被不正确地预测,则可以丢弃那些指令。 可以使用从BTC和DBP输出的分支预测信息来更新分支目标高速缓存用于后续预测。 在各种实施例中,BTC可以同时存储多个处理器线程的条目,并且可以是完全关联的。

    INSTRUCTION SUPPORT FOR PERFORMING STREAM CIPHER
    8.
    发明申请
    INSTRUCTION SUPPORT FOR PERFORMING STREAM CIPHER 审中-公开
    执行流水线的指导性支持

    公开(公告)号:US20120216020A1

    公开(公告)日:2012-08-23

    申请号:US13031571

    申请日:2011-02-21

    IPC分类号: G06F9/30

    摘要: Techniques relating to a processor that provides instruction-level support for a stream cipher are disclosed. In one embodiment, the processor supports a first instruction executable to perform an alpha multiplication, an alpha division, and an exclusive-OR operation using a result of the alpha multiplication and a result of the alpha division. In one embodiment, the processor supports a second instruction executable to perform a modular addition of a value R1 and a value S, and to perform a first exclusive-OR operation on a result of the modular addition and a value R2. In one embodiment, the processor supports a third instruction executable to perform a substitution-box (S-Box) operation on a value R1 to produce a value R2′, and to perform a modular addition using a value R2 to produce a value R1'.

    摘要翻译: 公开了与为流密码提供指令级支持的处理器有关的技术。 在一个实施例中,处理器支持可执行第一指令以使用α乘法的结果和α分割的结果执行α乘法,α分割和异或运算。 在一个实施例中,处理器支持可执行第二指令以执行值R1和值S的模块化添加,并且对模块化加法的结果和值R2执行第一异或运算。 在一个实施例中,处理器支持可执行第三指令以对值R1执行替代(S-Box)操作以产生值R2',并且使用值R2执行模数相加以产生值R1' 。

    BRANCH PREDICTION MECHANISM FOR PREDICTING INDIRECT BRANCH TARGETS
    9.
    发明申请
    BRANCH PREDICTION MECHANISM FOR PREDICTING INDIRECT BRANCH TARGETS 审中-公开
    用于预测间接分支目标的分支预测机制

    公开(公告)号:US20110078425A1

    公开(公告)日:2011-03-31

    申请号:US12566847

    申请日:2009-09-25

    IPC分类号: G06F9/38

    摘要: A multithreaded microprocessor includes an instruction fetch unit that may fetch and maintain a plurality of instructions belonging to one or more threads and one or more execution units that may concurrently execute the one or more threads. The instruction fetch unit includes a target branch prediction unit that may provide a predicted branch target address in response to receiving an instruction fetch address of a current indirect branch instruction. The branch prediction unit includes a primary storage and a control unit. The storage includes a plurality of entries, and each entry may store a predicted branch target address corresponding to a previous indirect branch instruction. The control unit may generate an index value for accessing the storage using a portion of the instruction fetch address of the current indirect branch instruction, and branch direction history information associated with a currently executing thread of the one or more threads.

    摘要翻译: 多线程微处理器包括指令提取单元,其可以获取和维护属于一个或多个线程的多个指令以及可以并行执行一个或多个线程的一个或多个执行单元。 指令提取单元包括目标分支预测单元,其可以响应于接收到当前间接分支指令的指令获取地址而提供预测分支目标地址。 分支预测单元包括主存储器和控制单元。 存储器包括多个条目,并且每个条目可以存储对应于先前间接分支指令的预测分支目标地址。 控制单元可以使用当前间接分支指令的指令获取地址的一部分以及与一个或多个线程的当前正在执行的线程相关联的分支方向历史信息来生成用于访问存储器的索引值。

    Multiple-core processor with flexible mapping of processor cores to cache banks
    10.
    发明授权
    Multiple-core processor with flexible mapping of processor cores to cache banks 有权
    多核处理器具有处理器内核与缓存存储库的灵活映射

    公开(公告)号:US07685354B1

    公开(公告)日:2010-03-23

    申请号:US11063792

    申请日:2005-02-23

    IPC分类号: G06F12/00 G06F12/06

    摘要: A multiple-core processor providing flexible mapping of processor cores to cache banks. In one embodiment, a processor may include a cache including a number of cache banks. The processor may further include a number of processor cores configured to access the cache banks, as well as core/bank mapping logic coupled to the cache banks and processor cores. The core/bank mapping logic may be configurable to map a cache bank select portion of a memory address specified by a given one of the processor cores to any one of the cache banks.

    摘要翻译: 多核处理器提供处理器内核到缓存存储库的灵活映射。 在一个实施例中,处理器可以包括包括多个高速缓存组的高速缓存。 处理器还可以包括被配置为访问高速缓存组的多个处理器核以及耦合到高速缓冲存储器组和处理器核的核心/库映射逻辑。 核心/库映射逻辑可以被配置为将由给定的一个处理器核心指定的存储器地址的高速缓存存储体选择部分映射到任何一个高速缓存存储体。