APPARATUS AND METHOD FOR IMPLEMENTING INSTRUCTION SUPPORT FOR PERFORMING A CYCLIC REDUNDANCY CHECK (CRC)
    1.
    发明申请
    APPARATUS AND METHOD FOR IMPLEMENTING INSTRUCTION SUPPORT FOR PERFORMING A CYCLIC REDUNDANCY CHECK (CRC) 有权
    实施循环冗余检查(CRC)的指导性支持的装置和方法

    公开(公告)号:US20110231636A1

    公开(公告)日:2011-09-22

    申请号:US12725243

    申请日:2010-03-16

    Abstract: Techniques relating to a processor including instruction support for implementing a cyclic redundancy check (CRC) operation. The processor may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit configured to receive instructions that include a first instance of a cyclic redundancy check (CRC) instruction defined within the ISA, where the first instance of the CRC instruction is executable by the cryptographic unit to perform a first CRC operation on a set of data that produces a checksum value. In one embodiment, the cryptographic unit is configured to generate the checksum value using a generator polynomial of 0x11EDC6F41. In some embodiments, the first instance of the CRC instruction specifies an initial value to be used in performing the first CRC operation, the set of data, and a storage location in which the cryptographic unit is configured to store the checksum value produced by the first CRC operation.

    Abstract translation: 涉及包括执行循环冗余校验(CRC)操作的指令支持的处理器的技术。 处理器可以从定义的指令集体系结构(ISA)发出执行编程器可选择的指令。 处理器可以包括被配置为接收包括在ISA内定义的循环冗余校验(CRC)指令的第一实例的指令的加密单元,其中CRC指令的第一实例可由密码单元执行以执行第一CRC操作 在产生校验和值的一组数据上。 在一个实施例中,密码单元被配置为使用生成器多项式0x11EDC6F41生成校验和值。 在一些实施例中,CRC指令的第一实例指定用于执行第一CRC操作,数据集合和存储位置的初始值,其中密码单元被配置为存储由第一个CRC操作产生的校验和值 CRC操作。

    METHODS AND MECHANISMS TO SUPPORT MULTIPLE FEATURES FOR A NUMBER OF OPCODES
    2.
    发明申请
    METHODS AND MECHANISMS TO SUPPORT MULTIPLE FEATURES FOR A NUMBER OF OPCODES 有权
    支持多个操作系统的多种功能的方法和机制

    公开(公告)号:US20100257338A1

    公开(公告)日:2010-10-07

    申请号:US12420054

    申请日:2009-04-07

    CPC classification number: G06F9/30145 G06F9/30101 G06F9/30167 G06F9/45504

    Abstract: Systems and methods for efficient instruction support of an multiple features for opcodes of an instruction set. A processor detects a fetched instruction of a computer program comprises an opcode corresponding to a plurality of functions. Each function corresponds to a different type of operation. The processor determines the received instruction corresponds to a feature requested by the computer program, such as a cryptographic algorithm. A determination is made as to whether hardware support exists for the feature. If hardware support exists for the feature, the instruction is executed on-chip by the hardware. Otherwise, software performs the operation corresponding to the instruction.

    Abstract translation: 用于指令集的操作码的多个特征的有效指令支持的系统和方法。 处理器检测计算机程序的获取指令包括对应于多个功能的操作码。 每个功能对应于不同类型的操作。 处理器确定接收到的指令对应于计算机程序所请求的特征,例如加密算法。 确定是否存在该功能的硬件支持。 如果该功能存在硬件支持,则该指令由硬件在片上执行。 否则,软件将执行与该指令相对应的操作。

    Missing store operation accelerator
    3.
    发明授权
    Missing store operation accelerator 有权
    缺少商店操作加速器

    公开(公告)号:US07757047B2

    公开(公告)日:2010-07-13

    申请号:US11271056

    申请日:2005-11-12

    CPC classification number: G06F12/0859

    Abstract: Maintaining a cache of indications of exclusively-owned coherence state for memory space units (e.g., cache line) allows reduction, if not elimination, of delay from missing store operations. In addition, the indications are maintained without corresponding data of the memory space unit, thus allowing representation of a large memory space with a relatively small missing store operation accelerator. With the missing store operation accelerator, a store operation, which misses in low-latency memory (e.g., L1 or L2 cache), proceeds as if the targeted memory space unit resides in the low-latency memory, if indicated in the missing store operation accelerator. When a store operation misses in low-latency memory and hits in the accelerator, a positive acknowledgement is transmitted to the writing processing unit allowing the store operation to proceed. An entry is allocated for the store operation, the store data is written into the allocated entry, and the target of the store operation is requested from memory. When a copy of the data at the requested memory space unit returns, the rest of the allocated entry is updated.

    Abstract translation: 维护用于存储器空间单元(例如,高速缓存行)的专有相干状态的指示的缓存允许减少(如果不是消除)缺失存储操作的延迟。 此外,在没有存储器空间单元的相应数据的情况下维持指示,从而允许用相对较小的缺少存储操作加速器来表示大的存储空间。 在缺少存储操作加速器的情况下,在低延迟存储器(例如L1或L2高速缓存)中丢失的存储操作如同目标存储器空间单元驻留在低延迟存储器中那样进行,如果在缺少的存储操作 加速器。 当存储操作在低延迟存储器中错过并且在加速器中点击时,肯定确认被发送到写入处理单元,从而允许存储操作继续进行。 为存储操作分配条目,将存储数据写入分配的条目,并且从存储器请求存储操作的目标。 当所请求的存储器空间单元上的数据的副本返回时,所分配的条目的其余部分被更新。

    Method and structure for pipelining of SIMD conditional moves
    4.
    发明授权
    Method and structure for pipelining of SIMD conditional moves 有权
    SIMD条件移动流水线的方法和结构

    公开(公告)号:US07480787B1

    公开(公告)日:2009-01-20

    申请号:US11341001

    申请日:2006-01-27

    Abstract: A mask is first generated in a general-purpose integer register. The mask is generated by executing a single instruction multiple data (SIMD) instruction on a plurality of operands stored in a plurality of registers and by writing the result to the general-purpose integer register. Next, a conditional-move mask is generated in a register using the mask, and then the conditional-move mask is used in selecting operands from the plurality of operands to generate a result in another register.

    Abstract translation: 首先在通用整数寄存器中生成掩码。 通过对存储在多个寄存器中的多个操作数执行单指令多数据(SIMD)指令并将结果写入通用整数寄存器来生成掩码。 接下来,使用掩码在寄存器中生成条件移动掩码,然后使用条件移动掩码来选择来自多个操作数的操作数,以在另一个寄存器中生成结果。

    Execution displacement read-write alias prediction
    5.
    发明授权
    Execution displacement read-write alias prediction 有权
    执行位移读写别名预测

    公开(公告)号:US07434031B1

    公开(公告)日:2008-10-07

    申请号:US10822390

    申请日:2004-04-12

    Abstract: RAW aliasing can be predicted with register bypassing based at least in part on execution displacement alias prediction. Repeated aliasing between read and write operations (e.g., within a loop), can be reliably predicted based on displacement between the aliasing operations. Performing register bypassing for predicted to alias operations facilitates faster RAW bypassing and mitigates the performance impact of aliasing read operations. The repeated aliasing between operations is tracked along with register information of the aliasing write operations. After exceeding a confidence threshold, an instance of a read operation is predicted to alias with an instance of a write operation in accordance with the previously observed repeated aliasing. Based on displacement between the instances of the operations, the register information of the write operation instance is used to bypass data to the read operation instance.

    Abstract translation: 可以至少部分地基于执行位移别名预测,通过寄存器旁路预测RAW混叠。 可以基于混叠操作之间的位移来可靠地预测读和写操作之间的重复混叠(例如,在循环内)。 执行寄存器旁路以预测别名操作有助于更快的RAW旁路,并减轻混叠读操作的性能影响。 操作之间的重复混叠跟踪混叠写操作的寄存器信息。 在超过置信阈值之后,根据先前观察到的重复混叠,预测读取操作的实例与写入操作的实例相混淆。 基于操作实例之间的位移,写操作实例的寄存器信息用于将数据旁路到读操作实例。

    Low Overhead Access to Shared On-Chip Hardware Accelerator With Memory-Based Interfaces
    6.
    发明申请
    Low Overhead Access to Shared On-Chip Hardware Accelerator With Memory-Based Interfaces 有权
    具有基于内存接口的共享片上硬件加速器的低架构访问

    公开(公告)号:US20080222396A1

    公开(公告)日:2008-09-11

    申请号:US11684348

    申请日:2007-03-09

    CPC classification number: G06F21/71 G06F12/1027 G06F2212/683

    Abstract: In one embodiment, a method is contemplated. Access to a hardware accelerator is requested by a user-privileged thread. Access to the hardware accelerator is granted to the user-privileged thread by a higher-privileged thread responsive to the requesting. One or more commands are communicated to the hardware accelerator by the user-privileged thread without intervention by higher-privileged threads and responsive to the grant of access. The one or more commands cause the hardware accelerator to perform one or more tasks. Computer readable media comprises instructions which, when executed, implement portions of the method are also contemplated in various embodiments, as is a hardware accelerator and a processor coupled to the hardware accelerator.

    Abstract translation: 在一个实施例中,预期了一种方法。 用户特权线程请求访问硬件加速器。 通过响应请求的较高特权线程向硬件加速器的访问授予用户特权线程。 一个或多个命令由用户特权的线程传送到硬件加速器,而不受较高特权线程的干扰,并响应于授权的访问。 一个或多个命令使硬件加速器执行一个或多个任务。 计算机可读介质包括当各种实施例中被执行时实施该方法的部分的指令,以及硬件加速器和耦合到硬件加速器的处理器。

    Efficient On-Chip Accelerator Interfaces to Reduce Software Overhead
    7.
    发明申请
    Efficient On-Chip Accelerator Interfaces to Reduce Software Overhead 有权
    高效的片上加速器接口,以减少软件开销

    公开(公告)号:US20080222383A1

    公开(公告)日:2008-09-11

    申请号:US11684358

    申请日:2007-03-09

    Abstract: In one embodiment, a processor comprises execution circuitry and a translation lookaside buffer (TLB) coupled to the execution circuitry. The execution circuitry is configured to execute a store instruction having a data operand; and the execution circuitry is configured to generate a virtual address as part of executing the store instruction. The TLB is coupled to receive the virtual address and configured to translate the virtual address to a first physical address. Additionally, the TLB is coupled to receive the data operand and to translate the data operand to a second physical address. A hardware accelerator is also contemplated in various embodiments, as is a processor coupled to the hardware accelerator, a method, and a computer readable medium storing instruction which, when executed, implement a portion of the method.

    Abstract translation: 在一个实施例中,处理器包括耦合到执行电路的执行电路和转换后备缓冲器(TLB)。 执行电路被配置为执行具有数据操作数的存储指令; 并且所述执行电路被配置为生成作为执行所述存储指令的一部分的虚拟地址。 所述TLB被耦合以接收所述虚拟地址并被配置为将所述虚拟地址转换为第一物理地址。 此外,TLB被耦合以接收数据操作数并将数据操作数转换为第二物理地址。 还可以在各种实施例中考虑硬件加速器,以及耦合到硬件加速器的处理器,方法和存储指令的计算机可读介质,其在执行时实现该方法的一部分。

    Apparatus and method for implementing instruction support for the data encryption standard (DES) algorithm
    8.
    发明授权
    Apparatus and method for implementing instruction support for the data encryption standard (DES) algorithm 有权
    用于实现数据加密标准(DES)算法的指令支持的装置和方法

    公开(公告)号:US08654970B2

    公开(公告)日:2014-02-18

    申请号:US12414755

    申请日:2009-03-31

    Abstract: A processor including instruction support for implementing the Data Encryption Standard (DES) block cipher algorithm may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit that may receive instructions for execution. The instructions include one or more DES instructions defined within the ISA. In addition, the DES instructions may be executable by the cryptographic unit to implement portions of an DES cipher that is compliant with Federal Information Processing Standards Publication 46-3 (FIPS 46-3). In response to receiving a DES key expansion instruction defined within the ISA, the cryptographic unit may generate one or more expanded cipher keys of the DES cipher key schedule from an input key.

    Abstract translation: 包括用于实现数据加密标准(DES)块密码算法的指令支持的处理器可以从定义的指令集体系结构(ISA)发出执行编程器可选择的指令。 处理器可以包括可以接收执行指令的密码单元。 指令包括在ISA内定义的一个或多个DES指令。 此外,DES指令可以由加密单元执行,以实现符合联邦信息处理标准出版物46-3(FIPS 46-3)的DES密码的部分。 响应于接收到在ISA内定义的DES密钥扩展指令,密码单元可以从输入密钥生成DES密码密钥调度的一个或多个扩展密码密钥。

    Apparatus and method for implementing instruction support for performing a cyclic redundancy check (CRC)
    9.
    发明授权
    Apparatus and method for implementing instruction support for performing a cyclic redundancy check (CRC) 有权
    用于执行用于执行循环冗余校验(CRC)的指令支持的装置和方法

    公开(公告)号:US08417961B2

    公开(公告)日:2013-04-09

    申请号:US12725243

    申请日:2010-03-16

    Abstract: Techniques relating to a processor including instruction support for implementing a cyclic redundancy check (CRC) operation. The processor may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit configured to receive instructions that include a first instance of a cyclic redundancy check (CRC) instruction defined within the ISA, where the first instance of the CRC instruction is executable by the cryptographic unit to perform a first CRC operation on a set of data that produces a checksum value. In one embodiment, the cryptographic unit is configured to generate the checksum value using a generator polynomial of 0x11EDC6F41. In some embodiments, the first instance of the CRC instruction specifies an initial value to be used in performing the first CRC operation, the set of data, and a storage location in which the cryptographic unit is configured to store the checksum value produced by the first CRC operation.

    Abstract translation: 涉及包括执行循环冗余校验(CRC)操作的指令支持的处理器的技术。 处理器可以从定义的指令集体系结构(ISA)发出执行编程器可选择的指令。 处理器可以包括被配置为接收包括在ISA内定义的循环冗余校验(CRC)指令的第一实例的指令的加密单元,其中CRC指令的第一实例可由密码单元执行以执行第一CRC操作 在产生校验和值的一组数据上。 在一个实施例中,密码单元被配置为使用生成器多项式0x11EDC6F41生成校验和值。 在一些实施例中,CRC指令的第一实例指定用于执行第一CRC操作,数据集合以及存储位置的初始值,其中密码单元被配置为存储由第一个CRC操作产生的校验和值 CRC操作。

    Method for efficient generation of a Fletcher checksum using a single SIMD pipeline
    10.
    发明授权
    Method for efficient generation of a Fletcher checksum using a single SIMD pipeline 有权
    使用单个SIMD管道高效生成Fletcher校验和的方法

    公开(公告)号:US08112691B1

    公开(公告)日:2012-02-07

    申请号:US12079367

    申请日:2008-03-25

    CPC classification number: G06F9/3887 G06F11/1004 H03M13/096

    Abstract: The generation of Fletcher/Alder partial checksums are transformed from a space that requires integer multiplications and additions to a space that requires only integer additions and shifts on a single SIMD pipeline capable processor. This transformation permits the use of Fletcher/Alder checksums on processors where the performance of SIMD instructions are sub-optimal, on CMT processors that support a single SIMD pipeline as well as other processors that can be configured by executing software to implement SIMD operations for a single SIMD pipeline. The implementation of the process with this transformation on a general-purpose computer system transforms that general-purpose computer system into a special-purpose computer system that uses a single SIMD pipeline to generate a Fletcher/Alder checksum. The elimination of integer multiplications in the generation of the partial checksums results in a significant improvement in performance.

    Abstract translation: Fletcher / Alder部分校验和的生成从需要整数乘法和加法的空间转换为只需要在单个具有SIMD流水线功能的处理器上进行整数加法和移位的空间。 这种转换允许在处理器上使用Fletcher / Alder校验和,其中SIMD指令的性能是次优的,在支持单个SIMD流水线的CMT处理器上以及可以通过执行软件来配置以实现SIMD操作的其他处理器 单SIMD管道。 在通用计算机系统上实现这一转换的过程将通用计算机系统转换为使用单个SIMD管道生成Fletcher / Alder校验和的专用计算机系统。 在产生部分校验和时消除整数乘法可以显着提高性能。

Patent Agency Ranking