APPARATUS AND METHOD FOR IMPLEMENTING INSTRUCTION SUPPORT FOR THE DATA ENCRYPTION STANDARD (DES) ALGORITHM
    21.
    发明申请
    APPARATUS AND METHOD FOR IMPLEMENTING INSTRUCTION SUPPORT FOR THE DATA ENCRYPTION STANDARD (DES) ALGORITHM 有权
    用于实施数据加密标准(DES)算法的指令支持的装置和方法

    公开(公告)号:US20100246814A1

    公开(公告)日:2010-09-30

    申请号:US12414755

    申请日:2009-03-31

    Abstract: A processor including instruction support for implementing the Data Encryption Standard (DES) block cipher algorithm may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit that may receive instructions for execution. The instructions include one or more DES instructions defined within the ISA. In addition, the DES instructions may be executable by the cryptographic unit to implement portions of an DES cipher that is compliant with Federal Information Processing Standards Publication 46-3 (FIPS 46-3). In response to receiving a DES key expansion instruction defined within the ISA, the cryptographic unit may generate one or more expanded cipher keys of the DES cipher key schedule from an input key.

    Abstract translation: 包括用于实现数据加密标准(DES)块密码算法的指令支持的处理器可以从定义的指令集体系结构(ISA)发出执行编程器可选择的指令。 处理器可以包括可以接收执行指令的密码单元。 指令包括在ISA内定义的一个或多个DES指令。 此外,DES指令可以由加密单元执行,以实现符合联邦信息处理标准出版物46-3(FIPS 46-3)的DES密码的部分。 响应于接收到在ISA内定义的DES密钥扩展指令,密码单元可以从输入密钥生成DES密码密钥调度的一个或多个扩展密码密钥。

    Efficient caching of stores in scalable chip multi-threaded systems
    22.
    发明授权
    Efficient caching of stores in scalable chip multi-threaded systems 有权
    在可扩展芯片多线程系统中高效缓存存储

    公开(公告)号:US07793044B1

    公开(公告)日:2010-09-07

    申请号:US11654150

    申请日:2007-01-16

    CPC classification number: G06F12/0811 G06F12/084

    Abstract: In accordance with one embodiment, an enhanced chip multiprocessor permits an L1 cache to request ownership of a data line from a shared L2 cache. A determination is made whether to deny or grant the request for ownership based on the sharing of the data line. In one embodiment, the sharing of the data line is determined from an enhanced L2 cache directory entry associated with the data line. If ownership of the data line is granted, the current data line is passed from the shared L2 to the requesting L1 cache and an associated enhanced L1 cache directory entry and the enhanced L2 cache directory entry are updated to reflect the L1 cache ownership of the data line. Consequently, updates of the data line by the L1 cache do not go through the shared L2 cache, thus reducing transaction pressure on the shared L2 cache.

    Abstract translation: 根据一个实施例,增强型芯片多处理器允许L1高速缓存从共享L2高速缓存请求数据线的所有权。 确定是否根据数据线的共享来拒绝或授予所有权请求。 在一个实施例中,从与数据线相关联的增强的L2高速缓存目录条目确定数据线的共享。 如果数据线的所有权被授予,则当前数据行从共享L2传递到请求的L1高速缓存,并且相关联的增强的L1高速缓存目录条目和增强的L2高速缓存目录条目被更新以反映数据的L1高速缓存所有权 线。 因此,L1高速缓存的数据线的更新不会通过共享的L2高速缓存,从而降低共享L2高速缓存上的事务压力。

    Accelerating cryptographic hash computations
    23.
    发明授权
    Accelerating cryptographic hash computations 有权
    加速密码散列计算

    公开(公告)号:US07599489B1

    公开(公告)日:2009-10-06

    申请号:US10783859

    申请日:2004-02-19

    CPC classification number: H04L9/0643 H04L2209/30

    Abstract: Provided is an apparatus and method for accelerating cryptographic hash computations. For example, in a cryptographic hash computation such as SHA-1, multiple execution units in a processor can process loosely coupled data. Specifically, after preprocessing a message with a particular bit length and parsing the padded message into multiple blocks, a first execution unit can begin processing the blocks for a message schedule computation. While the first block is processed, the first execution unit produces a partial result for the computation of the compression function in the second execution unit. By simultaneously processing the blocks on multiple execution units, the cryptographic hash computation performance can improve.

    Abstract translation: 提供了一种用于加速加密散列计算的装置和方法。 例如,在诸如SHA-1的加密散列计算中,处理器中的多个执行单元可以处理松散耦合的数据。 具体地说,在对具有特定位长的消息进行预处理并将填充消息解析为多个块之后,第一执行单元可以开始处理消息调度计算的块。 当处理第一块时,第一执行单元产生用于计算第二执行单元中的压缩函数的部分结果。 通过在多个执行单元上同时处理块,可以提高密码散列计算性能。

    Efficient on-chip instruction and data caching for chip multiprocessors
    24.
    发明授权
    Efficient on-chip instruction and data caching for chip multiprocessors 有权
    芯片多处理器的高效片上指令和数据缓存

    公开(公告)号:US07543112B1

    公开(公告)日:2009-06-02

    申请号:US11472141

    申请日:2006-06-20

    CPC classification number: G06F12/0897 G06F12/084

    Abstract: The storage of data line in one or more L1 caches and/or a shared L2 cache of a chip multiprocessor is dynamically optimized based on the sharing of the data line. In one embodiment, an enhanced L2 cache directory entry associated with the data line is generated in an L2 cache directory of the shared L2 cache. The enhanced L2 cache directory entry includes a cache mask indicating a storage state of the data line in the one or more L1 caches and the shared L2 cache. In some embodiments, where the data line is stored in the shared L2 cache only, a portion of the cache mask indicates a storage history of the data line in the one or more L2 caches.

    Abstract translation: 基于数据线的共享,数据线在码片多处理器的一个或多个L1高速缓存和/或共享L2高速缓存中的存储被动态优化。 在一个实施例中,在共享L2高速缓存的L2高速缓存目录中生成与数据线相关联的增强型L2高速缓存目录条目。 增强的L2高速缓存目录条目包括指示在一个或多个L1高速缓存和共享L2高速缓存中的数据线的存储状态的高速缓存掩码。 在一些实施例中,其中数据线仅存储在共享L2高速缓存中,高速缓存掩码的一部分指示一个或多个L2高速缓存中的数据线的存储历史。

    Hardware-based technique for improving the effectiveness of prefetching during scout mode
    25.
    发明授权
    Hardware-based technique for improving the effectiveness of prefetching during scout mode 有权
    基于硬件的技术,用于提高侦察模式下预取的有效性

    公开(公告)号:US07529911B1

    公开(公告)日:2009-05-05

    申请号:US11139866

    申请日:2005-05-26

    Abstract: One embodiment of the present invention provides a system that improves the effectiveness of prefetching during execution of instructions in scout mode. Upon encountering a non-data dependent stall condition, the system performs a checkpoint and commences execution of instructions in scout mode, wherein instructions are speculatively executed to prefetch future memory operations, but wherein results are not committed to the architectural state of a processor. When the system executes a load instruction during scout mode, if the load instruction causes a lower-level cache miss, the system allows the load instruction to access a higher-level cache. Next, the system places the load instruction and subsequent dependent instructions into a deferred queue, and resumes execution of the program in scout mode. If the load instruction ultimately causes a hit in the higher-level cache, the system replays the load instruction and subsequent dependent instructions in the deferred queue, whereby the value retrieved from the higher-level cache can help in generating prefetches during scout mode.

    Abstract translation: 本发明的一个实施例提供一种提高在侦察模式下执行指令期间预取的有效性的系统。 在遇到非数据相关失速条件时,系统执行检查点并开始执行侦察模式中的指令,其中推测性地执行指令以预取未来的存储器操作,但是其中结果未被提交到处理器的架构状态。 当系统在侦察模式下执行加载指令时,如果加载指令导致较低级别的高速缓存未命中,则系统允许加载指令访问更高级别的缓存。 接下来,系统将加载指令和后续相关指令放入延迟队列中,并以侦察模式恢复执行程序。 如果加载指令最终导致高级缓存中的命中,则系统在延迟队列中重放加载指令和后续相关指令,由此从较高级别缓存中检索的值可以帮助在侦察模式下产生预取。

    Prefetch prediction
    26.
    发明授权
    Prefetch prediction 有权
    预取预测

    公开(公告)号:US07434004B1

    公开(公告)日:2008-10-07

    申请号:US10870010

    申请日:2004-06-17

    CPC classification number: G06F12/0862 G06F2212/6024 G06F2212/6026

    Abstract: Predicting prefetch data sources for runahead execution triggering read operations eliminates the latency penalties of missing read operations that typically are not addressed by runahead execution mechanisms. Read operations that most likely trigger runahead execution are identified. The code unit that includes those triggering read operations is modified so that the code unit branches to a prefetch predictor. The prefetch predictor observes sequence patterns of data sources of triggering read operations and develops prefetch predictions based on the observed data source sequence patterns. After a prefetch prediction gains reliability, the prefetch predictor supplies a predicted data source to a prefetcher coincident with triggering of runahead execution.

    Abstract translation: 预测用于runahead执行触发读取操作的预取数据源消除了通常不由runahead执行机制解决的缺少读取操作的延迟处罚。 识别最有可能触发跑步执行的读操作。 包括那些触发读取操作的代码单元被修改,使得代码单元分支到预取预测器。 预取预测器观察触发读取操作的数据源的序列模式,并基于观察到的数据源序列模式开发预取预测。 在预取预测获得可靠性之后,预取预测器将预测数据源提供给与前导执行触发一致的预取数据。

    Software-accessible hardware support for determining set membership
    27.
    发明授权
    Software-accessible hardware support for determining set membership 有权
    用于确定集成员资格的软件可访问硬件支持

    公开(公告)号:US08788766B2

    公开(公告)日:2014-07-22

    申请号:US12708376

    申请日:2010-02-18

    CPC classification number: G06F9/30021 G06F9/30018

    Abstract: A method and processor supporting architected instructions for tracking and determining set membership, such as by implementing Bloom filters are disclosed. The apparatus includes storage arrays (e.g., registers) and an execution core configured to store an indication that a given value is a member of a set, including by executing an architected instruction having an operand specifying the given value, wherein executing comprises applying a hash function to the value to determine an index into one of the storage arrays and setting a bit of the storage array corresponding to the index. An architected query instruction is later executed to determine if a query value is not a member of the set, including by applying the hash function to the query value to determine an index into the storage array and determining whether a bit at the index of the storage array is set.

    Abstract translation: 公开了一种支持用于跟踪和确定集合成员资格的架构指令的方法和处理器,例如通过实现Bloom过滤器。 该装置包括存储阵列(例如,寄存器)和被配置为存储给定值是组的成员的指示的执行核心,包括通过执行具有指定给定值的操作数的架构化指令,其中执行包括应用散列 函数到该值以确定一个索引到一个存储阵列中并设置与该索引对应的存储阵列的位。 稍后执行架构化查询指令以确定查询值是否不是该集合的成员,包括通过将哈希函数应用于查询值来确定存储阵列中的索引并确定存储器的索引处的位 数组被设置。

    Methods and mechanisms to support multiple features for a number of opcodes
    28.
    发明授权
    Methods and mechanisms to support multiple features for a number of opcodes 有权
    支持多个操作码的多个功能的方法和机制

    公开(公告)号:US08195923B2

    公开(公告)日:2012-06-05

    申请号:US12420054

    申请日:2009-04-07

    CPC classification number: G06F9/30145 G06F9/30101 G06F9/30167 G06F9/45504

    Abstract: Systems and methods for efficient instruction support of an multiple features for opcodes of an instruction set. A processor detects a fetched instruction of a computer program comprises an opcode corresponding to a plurality of functions. Each function corresponds to a different type of operation. The processor determines the received instruction corresponds to a feature requested by the computer program, such as a cryptographic algorithm. A determination is made as to whether hardware support exists for the feature. If hardware support exists for the feature, the instruction is executed on-chip by the hardware. Otherwise, software performs the operation corresponding to the instruction.

    Abstract translation: 用于指令集的操作码的多个特征的有效指令支持的系统和方法。 处理器检测计算机程序的获取指令包括对应于多个功能的操作码。 每个功能都对应于不同类型的操作。 处理器确定接收到的指令对应于计算机程序所请求的特征,例如加密算法。 确定是否存在该功能的硬件支持。 如果该功能存在硬件支持,则该指令由硬件在片上执行。 否则,软件将执行与该指令相对应的操作。

    APPARATUS AND METHOD FOR IMPLEMENTING INSTRUCTION SUPPORT FOR THE ADVANCED ENCRYPTION STANDARD (AES) ALGORITHM
    29.
    发明申请
    APPARATUS AND METHOD FOR IMPLEMENTING INSTRUCTION SUPPORT FOR THE ADVANCED ENCRYPTION STANDARD (AES) ALGORITHM 审中-公开
    用于实施高级加密标准(AES)算法的指令支持的装置和方法

    公开(公告)号:US20100250965A1

    公开(公告)日:2010-09-30

    申请号:US12414852

    申请日:2009-03-31

    CPC classification number: G06F21/602 G06F9/30007 G06F9/3895

    Abstract: A processor including instruction support for implementing the Advanced Encryption Standard (AES) block cipher algorithm may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit that may receive instructions for execution. The instructions include one or more AES instructions defined within the ISA. In addition, the AES instructions may be executable by the cryptographic unit to implement portions of an AES cipher that is compliant with Federal Information Processing Standards Publication 197 (FIPS 197). In response to receiving a first AES encryption round instruction defined within the ISA, the cryptographic unit may perform an encryption round of the AES cipher on a first group of columns of cipher state having a plurality of rows and columns. A maximum number of columns included in the first group may be fewer than all of the columns of the cipher state.

    Abstract translation: 包括用于实现高级加密标准(AES)块密码算法的指令支持的处理器可以从定义的指令集体系结构(ISA)发布用于执行编程器可选择指令。 处理器可以包括可以接收执行指令的密码单元。 指令包括在ISA内定义的一个或多个AES指令。 此外,AES指令可以由密码单元执行,以实现符合联邦信息处理标准出版物197(FIPS 197)的AES密码的部分。 响应于接收到在ISA内定义的第一AES加密回合指令,加密单元可以在具有多个行和列的密码状态的第一组列上执行AES密码的加密循环。 包含在第一组中的最大列数可能少于密码状态的所有列。

    Method for efficient generation of a fletcher checksum using a single SIMD pipeline
    30.
    发明授权
    Method for efficient generation of a fletcher checksum using a single SIMD pipeline 有权
    使用单个SIMD流水线高效生成弗莱彻校验和的方法

    公开(公告)号:US08453035B1

    公开(公告)日:2013-05-28

    申请号:US13344380

    申请日:2012-01-05

    CPC classification number: G06F9/3887 G06F11/1004 H03M13/096

    Abstract: The generation of Fletcher/Alder partial checksums are transformed from a space that requires integer multiplications and additions to a space that requires only integer additions and shifts on a single SIMD pipeline capable processor. This transformation permits the use of Fletcher/Alder checksums on processors where the performance of SIMD instructions are sub-optimal, on CMT processors that support a single SIMD pipeline as well as other processors that can be configured by executing software to implement SIMD operations for a single SIMD pipeline. The implementation of the process with this transformation on a general-purpose computer system transforms that general-purpose computer system into a special-purpose computer system that uses a single SIMD pipeline to generate a Fletcher/Alder checksum. The elimination of integer multiplications in the generation of the partial checksums results in a significant improvement in performance.

    Abstract translation: Fletcher / Alder部分校验和的生成从需要整数乘法和加法的空间转换为只需要在单个具有SIMD流水线功能的处理器上进行整数加法和移位的空间。 这种转换允许在处理器上使用Fletcher / Alder校验和,其中SIMD指令的性能是次优的,在支持单个SIMD流水线的CMT处理器上以及可以通过执行软件来配置以实现SIMD操作的其他处理器 单SIMD管道。 在通用计算机系统上实现这一转换的过程将通用计算机系统转换为使用单个SIMD管道生成Fletcher / Alder校验和的专用计算机系统。 在产生部分校验和时消除整数乘法可以显着提高性能。

Patent Agency Ranking