Perceptron-based branch prediction mechanism for predicting conditional branch instructions on a multithreaded processor
    11.
    发明授权
    Perceptron-based branch prediction mechanism for predicting conditional branch instructions on a multithreaded processor 有权
    基于感知器的分支预测机制,用于在多线程处理器上预测条件分支指令

    公开(公告)号:US08904156B2

    公开(公告)日:2014-12-02

    申请号:US12578859

    申请日:2009-10-14

    IPC分类号: G06F9/38

    摘要: A multithreaded microprocessor includes an instruction fetch unit including a perceptron-based conditional branch prediction unit configured to provide, for each of one or more concurrently executing threads, a direction branch prediction. The conditional branch prediction unit includes a plurality of storages each including a plurality of entries. Each entry may be configured to store one or more prediction values. Each prediction value of a given storage may correspond to at least one conditional branch instruction in a cache line. The conditional branch prediction unit may generate a separate index value for accessing each storage by generating a first index value for accessing a first storage by combining one or more portions of a received instruction fetch address, and generating each other index value for accessing the other storages by combining the first index value with a different portion of direction branch history information.

    摘要翻译: 多线程微处理器包括指令提取单元,其包括基于感知器的条件分支预测单元,被配置为针对一个或多个并行执行的线程中的每一个为方向分支预测提供。 条件分支预测单元包括多个存储器,每个存储器包括多个条目。 每个条目可被配置为存储一个或多个预测值。 给定存储器的每个预测值可以对应于高速缓存行中的至少一个条件转移指令。 条件分支预测单元可以通过生成用于访问第一存储器的第一索引值来生成用于访问每个存储器的单独索引值,该第一索引值通过组合接收到的指令获取地址的一个或多个部分,并且生成彼此用于访问其他存储器的索引值 通过将第一索引值与方向分支历史信息的不同部分组合。

    PERCEPTRON-BASED BRANCH PREDICTION MECHANISM FOR PREDICTING CONDITIONAL BRANCH INSTRUCTIONS ON A MULTITHREADED PROCESSOR
    12.
    发明申请
    PERCEPTRON-BASED BRANCH PREDICTION MECHANISM FOR PREDICTING CONDITIONAL BRANCH INSTRUCTIONS ON A MULTITHREADED PROCESSOR 有权
    基于PERCEPTRON的分支预测机制,用于预测多处理器上的条件分支指令

    公开(公告)号:US20110087866A1

    公开(公告)日:2011-04-14

    申请号:US12578859

    申请日:2009-10-14

    IPC分类号: G06F9/38

    摘要: A multithreaded microprocessor includes an instruction fetch unit including a perceptron-based conditional branch prediction unit configured to provide, for each of one or more concurrently executing threads, a direction branch prediction. The conditional branch prediction unit includes a plurality of storages each including a plurality of entries. Each entry may be configured to store one or more prediction values. Each prediction value of a given storage may correspond to at least one conditional branch instruction in a cache line. The conditional branch prediction unit may generate a separate index value for accessing each storage by generating a first index value for accessing a first storage by combining one or more portions of a received instruction fetch address, and generating each other index value for accessing the other storages by combining the first index value with a different portion of direction branch history information.

    摘要翻译: 多线程微处理器包括指令提取单元,其包括基于感知器的条件分支预测单元,被配置为针对一个或多个同时执行的线程中的每一个提供方向分支预测。 条件分支预测单元包括多个存储器,每个存储器包括多个条目。 每个条目可被配置为存储一个或多个预测值。 给定存储器的每个预测值可以对应于高速缓存行中的至少一个条件转移指令。 条件分支预测单元可以通过生成用于访问第一存储器的第一索引值来生成用于访问每个存储器的单独索引值,该第一索引值通过组合接收到的指令获取地址的一个或多个部分,并且生成彼此用于访问其他存储器的索引值 通过将第一索引值与方向分支历史信息的不同部分组合。

    Apparatus and method for implementing instruction support for the data encryption standard (DES) algorithm
    13.
    发明授权
    Apparatus and method for implementing instruction support for the data encryption standard (DES) algorithm 有权
    用于实现数据加密标准(DES)算法的指令支持的装置和方法

    公开(公告)号:US08654970B2

    公开(公告)日:2014-02-18

    申请号:US12414755

    申请日:2009-03-31

    IPC分类号: H04K1/00 H04L9/00

    摘要: A processor including instruction support for implementing the Data Encryption Standard (DES) block cipher algorithm may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit that may receive instructions for execution. The instructions include one or more DES instructions defined within the ISA. In addition, the DES instructions may be executable by the cryptographic unit to implement portions of an DES cipher that is compliant with Federal Information Processing Standards Publication 46-3 (FIPS 46-3). In response to receiving a DES key expansion instruction defined within the ISA, the cryptographic unit may generate one or more expanded cipher keys of the DES cipher key schedule from an input key.

    摘要翻译: 包括用于实现数据加密标准(DES)块密码算法的指令支持的处理器可以从定义的指令集体系结构(ISA)发出执行编程器可选择的指令。 处理器可以包括可以接收执行指令的密码单元。 指令包括在ISA内定义的一个或多个DES指令。 此外,DES指令可以由加密单元执行,以实现符合联邦信息处理标准出版物46-3(FIPS 46-3)的DES密码的部分。 响应于接收到在ISA内定义的DES密钥扩展指令,密码单元可以从输入密钥生成DES密码密钥调度的一个或多个扩展密码密钥。

    Apparatus and method for implementing instruction support for performing a cyclic redundancy check (CRC)
    14.
    发明授权
    Apparatus and method for implementing instruction support for performing a cyclic redundancy check (CRC) 有权
    用于执行用于执行循环冗余校验(CRC)的指令支持的装置和方法

    公开(公告)号:US08417961B2

    公开(公告)日:2013-04-09

    申请号:US12725243

    申请日:2010-03-16

    IPC分类号: G06F11/30

    摘要: Techniques relating to a processor including instruction support for implementing a cyclic redundancy check (CRC) operation. The processor may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit configured to receive instructions that include a first instance of a cyclic redundancy check (CRC) instruction defined within the ISA, where the first instance of the CRC instruction is executable by the cryptographic unit to perform a first CRC operation on a set of data that produces a checksum value. In one embodiment, the cryptographic unit is configured to generate the checksum value using a generator polynomial of 0x11EDC6F41. In some embodiments, the first instance of the CRC instruction specifies an initial value to be used in performing the first CRC operation, the set of data, and a storage location in which the cryptographic unit is configured to store the checksum value produced by the first CRC operation.

    摘要翻译: 涉及包括执行循环冗余校验(CRC)操作的指令支持的处理器的技术。 处理器可以从定义的指令集体系结构(ISA)发出执行编程器可选择的指令。 处理器可以包括被配置为接收包括在ISA内定义的循环冗余校验(CRC)指令的第一实例的指令的加密单元,其中CRC指令的第一实例可由密码单元执行以执行第一CRC操作 在产生校验和值的一组数据上。 在一个实施例中,密码单元被配置为使用生成器多项式0x11EDC6F41生成校验和值。 在一些实施例中,CRC指令的第一实例指定用于执行第一CRC操作,数据集合以及存储位置的初始值,其中密码单元被配置为存储由第一个CRC操作产生的校验和值 CRC操作。

    Apparatus and method for local operand bypassing for cryptographic instructions
    15.
    发明授权
    Apparatus and method for local operand bypassing for cryptographic instructions 有权
    用于加密指令的本地操作数旁路的装置和方法

    公开(公告)号:US08356185B2

    公开(公告)日:2013-01-15

    申请号:US12575832

    申请日:2009-10-08

    IPC分类号: G06F9/312 G06F21/00

    摘要: A processor may include a hardware instruction fetch unit configured to issue instructions for execution, and a hardware functional unit configured to receive instructions for execution, where the instructions include cryptographic instruction(s) and non-cryptographic instruction(s). The functional unit may include a cryptographic execution pipeline configured to execute the cryptographic instructions with a corresponding cryptographic execution latency, and a non-cryptographic execution pipeline configured to execute the non-cryptographic instructions with a corresponding non-cryptographic execution latency that is longer than the cryptographic execution latency. The functional unit may further include a local bypass network configured to bypass results produced by the cryptographic execution pipeline to dependent cryptographic instructions executing within the cryptographic execution pipeline, such that each instruction within a sequence of dependent cryptographic instructions is executable with the cryptographic execution latency, and where the results of the cryptographic execution pipeline are not bypassed to any other functional unit within the processor.

    摘要翻译: 处理器可以包括被配置为发出用于执行的指令的硬件指令获取单元和被配置为接收用于执行的指令的硬件功能单元,其中所述指令包括加密指令和非加密指令。 功能单元可以包括被配置为执行具有相应的加密执行等待时间的加密指令的密码执行流水线,以及配置成执行非加密指令的非加密执行流水线,该非加密执行流水线的长度大于 加密执行延迟。 功能单元还可以包括局部旁路网络,其被配置为将由密码执行流水线产生的结果旁路到在密码执行流水线内执行的依赖密码指令,使得依赖密码指令序列内的每个指令都可以用密码执行等待时间执行, 并且其中加密执行流水线的结果不被旁路到处理器内的任何其他功能单元。

    System and method to manage address translation requests
    16.
    发明授权
    System and method to manage address translation requests 有权
    管理地址转换请求的系统和方法

    公开(公告)号:US08301865B2

    公开(公告)日:2012-10-30

    申请号:US12493941

    申请日:2009-06-29

    IPC分类号: G06F12/00 G06F9/26 G06F9/34

    CPC分类号: G06F12/1027 G06F2212/684

    摘要: A system and method for servicing translation lookaside buffer (TLB) misses may manage separate input and output pipelines within a memory management unit. A pending request queue (PRQ) in the input pipeline may include an instruction-related portion storing entries for instruction TLB (ITLB) misses and a data-related portion storing entries for potential or actual data TLB (DTLB) misses. A DTLB PRQ entry may be allocated to each load/store instruction selected from the pick queue. The system may select an ITLB- or DTLB-related entry for servicing dependent on prior PRQ entry selection(s). A corresponding entry may be held in a translation table entry return queue (TTERQ) in the output pipeline until a matching address translation is received from system memory. PRQ and/or TTERQ entries may be deallocated when a corresponding TLB miss is serviced. PRQ and/or TTERQ entries associated with a thread may be deallocated in response to a thread flush.

    摘要翻译: 用于服务翻译后备缓冲器(TLB)的系统和方法可以管理存储器管理单元内的单独的输入和输出管线。 输入流水线中的未决请求队列(PRQ)可以包括存储用于指令TLB(ITLB)未命中的条目的指令相关部分和存储潜在或实际数据TLB(DTLB)丢失的条目的数据相关部分。 可以将DTLB PRQ条目分配给从拾取队列中选择的每个加载/存储指令。 系统可以根据先前的PRQ条目选择来选择与ITLB或DTLB相关的条目进行服务。 相应的条目可以保存在输出流水线中的转换表条目返回队列(TTERQ)中,直到从系统存储器接收到匹配的地址转换。 当服务对应的TLB未命中时,PRQ和/或TTERQ条目可以被释放。 与线程相关联的PRQ和/或TTERQ条目可以响应于线程刷新而被释放。

    APPARATUS AND METHOD FOR IMPLEMENTING INSTRUCTION SUPPORT FOR THE CAMELLIA CIPHER ALGORITHM
    17.
    发明申请
    APPARATUS AND METHOD FOR IMPLEMENTING INSTRUCTION SUPPORT FOR THE CAMELLIA CIPHER ALGORITHM 有权
    用于实施CAMELLIA CIPHER算法的指导性支持的装置和方法

    公开(公告)号:US20100250964A1

    公开(公告)日:2010-09-30

    申请号:US12414831

    申请日:2009-03-31

    IPC分类号: G06F9/30

    摘要: A processor including instruction support for implementing the Camellia block cipher algorithm may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit that may receive instructions for execution. The instructions include one or more Camellia instructions defined within the ISA. In addition, the Camellia instructions may be executable by the cryptographic unit to implement portions of a Camellia cipher that is compliant with Internet Engineering Task Force (IETF) Request For Comments (RFC) 3713. In response to receiving a Camellia F( )-operation instruction defined within the ISA, the cryptographic unit may perform an F( ) operation, as defined by the Camellia cipher, upon a data input operand and a subkey operand, in which the data input operand and subkey operand may be specified by the Camellia F( )-operation instruction.

    摘要翻译: 包括用于实现Camellia块密码算法的指令支持的处理器可以从定义的指令集体系结构(ISA)发出用于执行编程器可选择指令的执行。 处理器可以包括可以接收执行指令的密码单元。 说明书包括在ISA内定义的一个或多个Camellia指令。 另外,山茶花指令可以由加密单元执行,以实现符合因特网工程任务组(IETF)请求注释(RFC)3713的Camellia密码的部分。响应于接收到Camellia F()操作 在ISA内定义的指令中,加密单元可以在数据输入操作数和子键操作数上执行由Camellia密码定义的F()操作,其中数据输入操作数和子键操作数可以由Camellia F ()操作指令。

    Performance instrumentation in a fine grain multithreaded multicore processor
    18.
    发明授权
    Performance instrumentation in a fine grain multithreaded multicore processor 有权
    精细多线程多核处理器中的性能测试

    公开(公告)号:US07702887B1

    公开(公告)日:2010-04-20

    申请号:US10881032

    申请日:2004-06-30

    IPC分类号: G06F7/38

    摘要: A method and mechanism for monitoring events in a processing system. A performance monitoring mechanism includes is configured to store a count of events in an event counter. Periodically, the count stored in the event counter is updated to a new count. If the new count equals a predetermined value, an indication that the count equals the predetermined value is conveyed. If the new count does not equal the predetermined value, but is within a given epsilon of the predetermined value and the occurrence of a corresponding event is detected, an indication that the count equals the predetermined value is conveyed. The mechanism is further configured to suppress event counts which correspond to mis-speculations.

    摘要翻译: 一种用于监控处理系统中的事件的方法和机制。 性能监视机制包括被配置为在事件计数器中存储事件的计数。 定期将存储在事件计数器中的计数更新为新计数。 如果新计数等于预定值,则传达计数等于预定值的指示。 如果新计数不等于预定值,但是在预定值的给定ε内,并且检测到对应事件的发生,则传达计数等于预定值的指示。 该机构还被配置为抑制与错误猜测相对应的事件计数。

    Apparatus and method for implementing a unified hash algorithm pipeline
    19.
    发明授权
    Apparatus and method for implementing a unified hash algorithm pipeline 有权
    用于实现统一哈希算法流水线的装置和方法

    公开(公告)号:US07684563B1

    公开(公告)日:2010-03-23

    申请号:US10968428

    申请日:2004-10-19

    IPC分类号: H04K1/00 H04L9/00 H04L9/28

    摘要: An apparatus and method for implementing a unified hash algorithm pipeline. In one embodiment, a cryptographic unit may include hash logic configured to compute a hash value of a data block according to a hash algorithm, where the hash algorithm is dynamically selectable from a plurality of hash algorithms, and where the hash logic comprises a plurality of pipeline stages each configured to compute a portion of the hash algorithm. The cryptographic unit may further include a word buffer configured to store the data block during computing by the hash logic.

    摘要翻译: 一种用于实现统一哈希算法流水线的装置和方法。 在一个实施例中,密码单元可以包括哈希逻辑,其被配置为根据散列算法计算数据块的哈希值,其中散列算法可以从多个散列算法动态地选择,并且其中散列逻辑包括多个 流水线级分别被配置为计算散列算法的一部分。 加密单元还可以包括字缓冲器,其被配置为在由哈希逻辑计算期间存储数据块。

    Apparatus and method for fine-grained multithreading in a multipipelined processor core
    20.
    发明授权
    Apparatus and method for fine-grained multithreading in a multipipelined processor core 有权
    多重处理器核心中的细粒度多线程的装置和方法

    公开(公告)号:US07401206B2

    公开(公告)日:2008-07-15

    申请号:US10880488

    申请日:2004-06-30

    IPC分类号: G06F9/34

    摘要: An apparatus and method for fine-grained multithreading in a multipipelined processor core. According to one embodiment, a processor may include instruction fetch logic configured to assign a given one of a plurality of threads to a corresponding one of a plurality of thread groups, where each of the plurality of thread groups may comprise a subset of the plurality of threads, to issue a first instruction from one of the plurality of threads during one execution cycle, and to issue a second instruction from another one of the plurality of threads during a successive execution cycle. The processor may further include a plurality of execution units, each configured to execute instructions issued from a respective thread group.

    摘要翻译: 一种用于多行处理器核心中的细粒度多线程的装置和方法。 根据一个实施例,处理器可以包括指令提取逻辑,其被配置为将多个线程中的给定一个线程分配给多个线程组中的相应一个线程组,其中多个线程组中的每一个可以包括多个线程组的子集 线程,以在一个执行周期期间从多个线程之一发出第一指令,并且在连续执行周期期间从多个线程中的另一个发出第二指令。 处理器还可以包括多个执行单元,每个执行单元被配置为执行从相应的线程组发出的指令。