MULTITHREADED STATIC TIMING ANALYSIS
    1.
    发明申请
    MULTITHREADED STATIC TIMING ANALYSIS 有权
    多重静态时序分析

    公开(公告)号:US20090106717A1

    公开(公告)日:2009-04-23

    申请号:US11876688

    申请日:2007-10-22

    IPC分类号: G06F17/50

    CPC分类号: G06F17/5031

    摘要: A method and apparatus for executing multithreaded algorithm to provide static timing analysis of a chip design includes analyzing a chip design to identify various components and nodes associated with the components. A node tree is built with a plurality of nodes. The node tree identifies groups of nodes that are available in different levels. A size of node grouping for a current level is determined by looking up the node tree. Testing data for parallel processing of different size of node groupings using varied thread counts is compiled. An optimum thread count for the current level based on the size of node grouping in the node tree is identified from compiled testing data. Dynamic parallel processing of nodes in the current level is performed using the number of threads identified by the optimum thread count. An acceptable design of the chip is determined by the dynamic parallel processing.

    摘要翻译: 用于执行多线程算法以提供芯片设计的静态时序分析的方法和装置包括分析芯片设计以识别与组件相关联的各种组件和节点。 节点树用多个节点构建。 节点树识别不同级别可用的节点组。 通过查找节点树来确定当前级别的节点分组大小。 编译使用不同线程计数的不同大小节点组并行处理的测试数据。 根据编译的测试数据识别基于节点树中节点分组大小的当前级别的最优线程数。 使用由最优线程计数识别的线程数来执行当前级别的节点的动态并行处理。 芯片的可接受设计由动态并行处理决定。

    Method for trace collection
    2.
    发明授权
    Method for trace collection 有权
    追踪收集方法

    公开(公告)号:US07478371B1

    公开(公告)日:2009-01-13

    申请号:US10690056

    申请日:2003-10-20

    申请人: Darryl J. Gove

    发明人: Darryl J. Gove

    IPC分类号: G06F9/44

    摘要: A method is provided for obtaining data to be used in evaluating performance of a computer processor. More specifically, the method provides for efficiently obtaining traces from an application program for use in a simulation of a computer processor. The method uses both an original code defining the application program and an instrumented version of the original code (“instrumented code”). The method includes apportioning a total time of execution of the application program between the original code and the instrumented code. Transition of execution between the original and instrumented codes is conducted through either modification of function calls or through consultation with a mapping of instruction address correspondences between the original and instrumented codes.

    摘要翻译: 提供了一种用于获得用于评估计算机处理器的性能的数据的方法。 更具体地,该方法提供了有效地从应用程序获取轨迹,以用于计算机处理器的仿真。 该方法使用定义应用程序的原始代码和原始代码(“instrumented code”)的检测版本。 该方法包括在原始代码和被检测代码之间分配执行应用程序的总时间。 原始和仪器化代码之间的执行转换通过功能调用的修改或者通过协商原始和检测代码之间的指令地址对应关系的映射进行。

    Vector operations for compressing selected vector elements
    3.
    发明授权
    Vector operations for compressing selected vector elements 有权
    用于压缩所选向量元素的向量运算

    公开(公告)号:US09280342B2

    公开(公告)日:2016-03-08

    申请号:US13187132

    申请日:2011-07-20

    申请人: Darryl J. Gove

    发明人: Darryl J. Gove

    IPC分类号: G06F9/30

    摘要: A processor, method, and medium for using vector operations to compress selected elements of a vector. An input vector is compared to a criteria vector, and then a subset of the plurality of elements of the input vector are selected based on the comparison. A permutation vector is generated based on the locations of the selected elements and then the permutation vector is used to permute the selected elements of the input vector to an output vector. The selected elements of the input vector are stored in contiguous locations in the leftmost elements of the output vector. Then, the output vector is stored to memory and a pointer to the memory location is incremented by the number of selected elements.

    摘要翻译: 用于使用向量操作来压缩向量的所选元素的处理器,方法和介质。 将输入向量与标准向量进行比较,然后基于比较来选择输入向量的多个元素的子集。 基于所选择的元素的位置生成置换向量,然后使用置换向量将输入向量的所选择的元素置换为输出向量。 输入向量的所选元素存储在输出向量的最左边元素中的连续位置。 然后,将输出向量存储到存储器中,并且指向存储器位置的指针增加所选元素的数量。

    Acceleration of string comparisons using vector instructions
    4.
    发明授权
    Acceleration of string comparisons using vector instructions 有权
    使用向量指令加速字符串比较

    公开(公告)号:US09009447B2

    公开(公告)日:2015-04-14

    申请号:US13185244

    申请日:2011-07-18

    申请人: Darryl J. Gove

    发明人: Darryl J. Gove

    IPC分类号: G06F9/30

    摘要: A processor, method, and medium for using vector instructions to perform string comparisons. A single instruction compares the elements of two vectors and simultaneously checks for the null character. If an inequality or the null character is found, then the string comparison loop terminates, and a further check is performed to determine if the strings are equal. If all elements are equal and the null character is not found, then another iteration of the string comparison loop is executed. The vectors are loaded with the next portions of the strings, and then the next comparison is performed. The loop continues until either an inequality or the null character is found.

    摘要翻译: 用于使用向量指令执行字符串比较的处理器,方法和介质。 单个指令比较两个向量的元素,同时检查空字符。 如果发现不等式或空字符,则字符串比较循环将终止,并执行进一步检查以确定字符串是否相等。 如果所有元素相等并且未找到空字符,则执行字符串比较循环的另一次迭代。 向量被加载到字符串的下一部分,然后进行下一个比较。 循环继续,直到找到不等式或空字符。

    USING HARDWARE SUPPORT TO REDUCE SYNCHRONIZATION COSTS IN MULTITHREADED APPLICATIONS
    5.
    发明申请
    USING HARDWARE SUPPORT TO REDUCE SYNCHRONIZATION COSTS IN MULTITHREADED APPLICATIONS 有权
    使用硬件支持降低多路应用中的同步成本

    公开(公告)号:US20090300643A1

    公开(公告)日:2009-12-03

    申请号:US12127509

    申请日:2008-05-27

    申请人: Darryl J. Gove

    发明人: Darryl J. Gove

    IPC分类号: G06F9/46

    摘要: A processor configured to synchronize threads in multithreaded applications. The processor includes first and second registers. The processor stores a first bitmask in the first register and a second bitmask in the second register. For each bitmask, each bit corresponds with one of multiple threads. A given bit in the first bitmask indicates the corresponding thread has been assigned to execute a portion of a unit of work. A corresponding bit in the second bitmask indicates the corresponding thread has completed execution of its assigned portion of the unit of work. The processor receives updates to the second bitmask in the second register and provides an indication that the unit of work has been completed in response to detecting that for each bit in the first bitmask that corresponds to a thread that is assigned work, a corresponding bit in the second bitmask indicates its corresponding thread has completed its assigned work.

    摘要翻译: 配置为在多线程应用程序中同步线程的处理器。 处理器包括第一和第二寄存器。 处理器在第一寄存器中存储第一位掩码,并在第二寄存器中存储第二位掩码。 对于每个位掩码,每个位对应于多个线程之一。 第一个位掩码中的给定位指示相应的线程已被分配以执行工作单元的一部分。 第二个位掩码中的相应位指示相应的线程已完成其分配的工作单元部分的执行。 处理器接收对第二寄存器中的第二位掩码的更新,并且响应于检测到对应于被分配工作的线程的第一位掩码中的每个位,提供工作单元已经完成的指示, 第二个位掩码表示其对应的线程已完成其分配的工作。

    MAXIMIZING ENCODINGS OF VERSION CONTROL BITS FOR MEMORY CORRUPTION DETECTION
    6.
    发明申请
    MAXIMIZING ENCODINGS OF VERSION CONTROL BITS FOR MEMORY CORRUPTION DETECTION 有权
    最大限度地增加用于存储器损坏检测的版本控制位的编码

    公开(公告)号:US20130036332A1

    公开(公告)日:2013-02-07

    申请号:US13198904

    申请日:2011-08-05

    IPC分类号: G06F11/14

    摘要: Systems and methods for maximizing a number of available states for a version number used for memory corruption detection. A physical memory may be a DRAM comprising a plurality of regions. Version numbers associated with data structures allocated in the physical memory may be generated so that version numbers of adjacent data structures in a virtual address space are different. A reserved set and an available set of version numbers are associated with each one of the plurality of regions. A version number in a reserved set of a given region may be in an available set of another region. The processor detects no memory corruption error in response to at least determining a version number stored in a memory location in a first region identified by a memory access operation is also in a reserved set associated with the first region.

    摘要翻译: 用于最大化用于内存损坏检测的版本号的可用状态数量的系统和方法。 物理存储器可以是包括多个区域的DRAM。 可以生成与物理存储器中分配的数据结构相关联的版本号,使得虚拟地址空间中的相邻数据结构的版本号不同。 保留集合和可用的版本号集合与多个区域中的每一个相关联。 给定区域的保留集合中的版本号可以在另一区域的可用集合中。 响应于至少确定存储在由存储器访问操作识别的第一区域中的存储器位置中的版本号也处于与第一区域相关联的保留集中,处理器不检测存储器损坏错误。

    ACCELERATION OF STRING COMPARISONS USING VECTOR INSTRUCTIONS
    7.
    发明申请
    ACCELERATION OF STRING COMPARISONS USING VECTOR INSTRUCTIONS 有权
    使用矢量指令加速STRING比较

    公开(公告)号:US20130024653A1

    公开(公告)日:2013-01-24

    申请号:US13185244

    申请日:2011-07-18

    申请人: Darryl J. Gove

    发明人: Darryl J. Gove

    IPC分类号: G06F9/312 G06F15/76

    摘要: A processor, method, and medium for using vector instructions to perform string comparisons. A single instruction compares the elements of two vectors and simultaneously checks for the null character. If an inequality or the null character is found, then the string comparison loop terminates, and a further check is performed to determine if the strings are equal. If all elements are equal and the null character is not found, then another iteration of the string comparison loop is executed. The vectors are loaded with the next portions of the strings, and then the next comparison is performed. The loop continues until either an inequality or the null character is found.

    摘要翻译: 用于使用向量指令执行字符串比较的处理器,方法和介质。 单个指令比较两个向量的元素,同时检查空字符。 如果发现不等式或空字符,则字符串比较循环将终止,并执行进一步检查以确定字符串是否相等。 如果所有元素相等并且未找到空字符,则执行字符串比较循环的另一次迭代。 向量被加载到字符串的下一部分,然后进行下一个比较。 循环继续,直到找到不等式或空字符。

    Using register readiness to facilitate value prediction
    8.
    发明授权
    Using register readiness to facilitate value prediction 有权
    使用寄存器准备便利值预测

    公开(公告)号:US07539851B2

    公开(公告)日:2009-05-26

    申请号:US11437478

    申请日:2006-05-18

    申请人: Darryl J. Gove

    发明人: Darryl J. Gove

    IPC分类号: G06F9/00 G06F9/45

    摘要: One embodiment of the present invention provides a system for using register readiness to facilitate value prediction. The system starts by loading a previously computed result for a function to a destination register for the function from a lookup table. The system then checks the destination register for the function by using a Branch-Register-Not-Ready (BRNR) instruction to check the readiness of the destination register. If the destination register is ready, the system uses the previously computed result in the destination register as the result of the function. Loading the value from the lookup table in this way avoids unnecessarily calculating the result of the function when that result has previously been computed.

    摘要翻译: 本发明的一个实施例提供一种用于使用寄存器准备以促进价值预测的系统。 系统首先将功能的先前计算结果从查找表加载到函数的目标寄存器。 然后,系统通过使用分支寄存器不就绪(BRNR)指令检查目的寄存器的功能,以检查目的寄存器的准备情况。 如果目标寄存器准备就绪,则系统将作为该功能的结果使用目的寄存器中的先前计算结果。 以这种方式从查找表中加载值可避免在先前计算结果时不必要地计算函数的结果。

    Prefetch prediction
    9.
    发明授权
    Prefetch prediction 有权
    预取预测

    公开(公告)号:US07434004B1

    公开(公告)日:2008-10-07

    申请号:US10870010

    申请日:2004-06-17

    IPC分类号: G06F12/00

    摘要: Predicting prefetch data sources for runahead execution triggering read operations eliminates the latency penalties of missing read operations that typically are not addressed by runahead execution mechanisms. Read operations that most likely trigger runahead execution are identified. The code unit that includes those triggering read operations is modified so that the code unit branches to a prefetch predictor. The prefetch predictor observes sequence patterns of data sources of triggering read operations and develops prefetch predictions based on the observed data source sequence patterns. After a prefetch prediction gains reliability, the prefetch predictor supplies a predicted data source to a prefetcher coincident with triggering of runahead execution.

    摘要翻译: 预测用于runahead执行触发读取操作的预取数据源消除了通常不由runahead执行机制解决的缺少读取操作的延迟处罚。 识别最有可能触发跑步执行的读操作。 包括那些触发读取操作的代码单元被修改,使得代码单元分支到预取预测器。 预取预测器观察触发读取操作的数据源的序列模式,并基于观察到的数据源序列模式开发预取预测。 在预取预测获得可靠性之后,预取预测器将预测数据源提供给与前导执行触发一致的预取数据。

    Instructions to set and read memory version information
    10.
    发明授权
    Instructions to set and read memory version information 有权
    设置和读取内存版本信息的说明

    公开(公告)号:US08751736B2

    公开(公告)日:2014-06-10

    申请号:US13196514

    申请日:2011-08-02

    IPC分类号: G06F12/02

    摘要: Systems and methods for providing additional instructions for supporting efficient memory corruption detection in a processor. A physical memory may be a DRAM with a spare bank of memory reserved for a hardware failover mechanism. Version numbers associated with data structures allocated in the memory may be generated so that version numbers of adjacent data structures are different. A processor determines that a fetched instruction is a memory access instruction corresponding to a first data structure within the memory. For instructions that are not a version update instruction, the processor compares the first version number and second version number stored in a location in the memory indicated by the generated address and flags an error if there is a mismatch. For version update instructions, the processor performs a memory access operation on the second version number with no comparison check.

    摘要翻译: 用于提供用于在处理器中支持有效的内存损坏检测的附加指令的系统和方法。 物理存储器可以是DRAM,其具有为硬件故障转移机制保留的备用存储体。 可以生成与分配在存储器中的数据结构相关联的版本号,使得相邻数据结构的版本号不同。 处理器确定所提取的指令是与存储器内的第一数据结构相对应的存储器访问指令。 对于不是版本更新指令的指令,处理器比较存储在由生成的地址指示的存储器中的位置中的第一版本号和第二版本号,并且如果存在不匹配则标记错误。 对于版本更新指令,处理器对第二版本号执行存储器访问操作,而不进行比较检查。