Low Overhead Access to Shared On-Chip Hardware Accelerator With Memory-Based Interfaces
    1.
    发明申请
    Low Overhead Access to Shared On-Chip Hardware Accelerator With Memory-Based Interfaces 有权
    具有基于内存接口的共享片上硬件加速器的低架构访问

    公开(公告)号:US20080222396A1

    公开(公告)日:2008-09-11

    申请号:US11684348

    申请日:2007-03-09

    IPC分类号: G06F9/50

    摘要: In one embodiment, a method is contemplated. Access to a hardware accelerator is requested by a user-privileged thread. Access to the hardware accelerator is granted to the user-privileged thread by a higher-privileged thread responsive to the requesting. One or more commands are communicated to the hardware accelerator by the user-privileged thread without intervention by higher-privileged threads and responsive to the grant of access. The one or more commands cause the hardware accelerator to perform one or more tasks. Computer readable media comprises instructions which, when executed, implement portions of the method are also contemplated in various embodiments, as is a hardware accelerator and a processor coupled to the hardware accelerator.

    摘要翻译: 在一个实施例中,预期了一种方法。 用户特权线程请求访问硬件加速器。 通过响应请求的较高特权线程向硬件加速器的访问授予用户特权线程。 一个或多个命令由用户特权的线程传送到硬件加速器,而不受较高特权线程的干扰,并响应于授权的访问。 一个或多个命令使硬件加速器执行一个或多个任务。 计算机可读介质包括当各种实施例中被执行时实施该方法的部分的指令,以及硬件加速器和耦合到硬件加速器的处理器。

    Low overhead access to shared on-chip hardware accelerator with memory-based interfaces
    2.
    发明授权
    Low overhead access to shared on-chip hardware accelerator with memory-based interfaces 有权
    具有基于内存的接口的共享片上硬件加速器的低开销访问

    公开(公告)号:US07809895B2

    公开(公告)日:2010-10-05

    申请号:US11684348

    申请日:2007-03-09

    IPC分类号: G06F12/14 G06F13/00 G06F15/82

    摘要: In one embodiment, a method is contemplated. Access to a hardware accelerator is requested by a user-privileged thread. Access to the hardware accelerator is granted to the user-privileged thread by a higher-privileged thread responsive to the requesting. One or more commands are communicated to the hardware accelerator by the user-privileged thread without intervention by higher-privileged threads and responsive to the grant of access. The one or more commands cause the hardware accelerator to perform one or more tasks. Computer readable media comprises instructions which, when executed, implement portions of the method are also contemplated in various embodiments, as is a hardware accelerator and a processor coupled to the hardware accelerator.

    摘要翻译: 在一个实施例中,预期了一种方法。 用户特权线程请求访问硬件加速器。 通过响应请求的较高特权线程向硬件加速器的访问授予用户特权线程。 一个或多个命令由用户特权的线程传送到硬件加速器,而不受较高特权线程的干扰,并响应于授权的访问。 一个或多个命令使硬件加速器执行一个或多个任务。 计算机可读介质包括当各种实施例中被执行时实施该方法的部分的指令,以及硬件加速器和耦合到硬件加速器的处理器。

    Apparatus and method for profiling system events in a fine grain multi-threaded multi-core processor
    3.
    发明授权
    Apparatus and method for profiling system events in a fine grain multi-threaded multi-core processor 有权
    在细粒度多线程多核处理器中对系统事件进行分析的装置和方法

    公开(公告)号:US08762951B1

    公开(公告)日:2014-06-24

    申请号:US11689359

    申请日:2007-03-21

    IPC分类号: G06F9/44

    摘要: A system and method for profiling runtime system events of a computer system may include associating a data source type with detected system events. The system events may be detected dependent on information included in a reply message received by a processor in response to a data request or other transaction request message. The reply message may include information characterizing a source type of a source of data included in the reply message. The source type information may indicate that the source is remote or local; that it is a shared or a private storage location; that the data is supplied via a cache-to-cache transfer; or that the data is sourced from a coherency domain other than that of the requesting process. Instructions, events, messages, and replies may be sampled, and extended address information corresponding to the samples may be stored in an event set database for performance analysis.

    摘要翻译: 用于分析计算机系统的运行时系统事件的系统和方法可以包括将数据源类型与检测到的系统事件相关联。 可以根据由处理器响应于数据请求或其他事务请求消息而接收到的应答消息中包括的信息来检测系统事件。 回复消息可以包括表征包括在回复消息中的数据源的源类型的信息。 源类型信息可以指示源是远程的或本地的; 它是一个共享或私人存储位置; 数据通过缓存到缓存传输提供; 或者数据来自与请求进程的一致性域之外的一致性域。 可以对指令,事件,消息和答复进行采样,并且与样本相对应的扩展地址信息可以存储在用于性能分析的事件集数据库中。

    Efficient On-Chip Accelerator Interfaces to Reduce Software Overhead
    4.
    发明申请
    Efficient On-Chip Accelerator Interfaces to Reduce Software Overhead 有权
    高效的片上加速器接口,以减少软件开销

    公开(公告)号:US20080222383A1

    公开(公告)日:2008-09-11

    申请号:US11684358

    申请日:2007-03-09

    IPC分类号: G06F9/34

    摘要: In one embodiment, a processor comprises execution circuitry and a translation lookaside buffer (TLB) coupled to the execution circuitry. The execution circuitry is configured to execute a store instruction having a data operand; and the execution circuitry is configured to generate a virtual address as part of executing the store instruction. The TLB is coupled to receive the virtual address and configured to translate the virtual address to a first physical address. Additionally, the TLB is coupled to receive the data operand and to translate the data operand to a second physical address. A hardware accelerator is also contemplated in various embodiments, as is a processor coupled to the hardware accelerator, a method, and a computer readable medium storing instruction which, when executed, implement a portion of the method.

    摘要翻译: 在一个实施例中,处理器包括耦合到执行电路的执行电路和转换后备缓冲器(TLB)。 执行电路被配置为执行具有数据操作数的存储指令; 并且所述执行电路被配置为生成作为执行所述存储指令的一部分的虚拟地址。 所述TLB被耦合以接收所述虚拟地址并被配置为将所述虚拟地址转换为第一物理地址。 此外,TLB被耦合以接收数据操作数并将数据操作数转换为第二物理地址。 还可以在各种实施例中考虑硬件加速器,以及耦合到硬件加速器的处理器,方法和存储指令的计算机可读介质,其在执行时实现该方法的一部分。

    Efficient on-chip accelerator interfaces to reduce software overhead
    5.
    发明授权
    Efficient on-chip accelerator interfaces to reduce software overhead 有权
    高效的片上加速器接口,以减少软件开销

    公开(公告)号:US07827383B2

    公开(公告)日:2010-11-02

    申请号:US11684358

    申请日:2007-03-09

    IPC分类号: G06F9/34 G06F12/08

    摘要: In one embodiment, a processor comprises execution circuitry and a translation lookaside buffer (TLB) coupled to the execution circuitry. The execution circuitry is configured to execute a store instruction having a data operand; and the execution circuitry is configured to generate a virtual address as part of executing the store instruction. The TLB is coupled to receive the virtual address and configured to translate the virtual address to a first physical address. Additionally, the TLB is coupled to receive the data operand and to translate the data operand to a second physical address. A hardware accelerator is also contemplated in various embodiments, as is a processor coupled to the hardware accelerator, a method, and a computer readable medium storing instruction which, when executed, implement a portion of the method.

    摘要翻译: 在一个实施例中,处理器包括耦合到执行电路的执行电路和转换后备缓冲器(TLB)。 执行电路被配置为执行具有数据操作数的存储指令; 并且所述执行电路被配置为生成作为执行所述存储指令的一部分的虚拟地址。 所述TLB被耦合以接收所述虚拟地址并被配置为将所述虚拟地址转换为第一物理地址。 此外,TLB被耦合以接收数据操作数并将数据操作数转换为第二物理地址。 还可以在各种实施例中考虑硬件加速器,以及耦合到硬件加速器的处理器,方法和存储指令的计算机可读介质,其在被执行时实现该方法的一部分。

    Method and apparatus for resolving multiple branches
    6.
    发明授权
    Method and apparatus for resolving multiple branches 失效
    用于解决多个分支的方法和装置

    公开(公告)号:US06256729B1

    公开(公告)日:2001-07-03

    申请号:US09004971

    申请日:1998-01-09

    IPC分类号: G06F1500

    CPC分类号: G06F9/3861 G06F9/3806

    摘要: A method for repairing a pipeline in response to a branch instruction having a branch, includes the steps of providing a branch repair table having a plurality of entries, allocating an entry in the branch repair table for the branch instruction, storing a target address, a fall-through address, and repair information in the entry in the branch repair table, processing the branch instruction to determine whether the branch was taken, and repairing the pipeline in response to the repair information and the fall-through address in the entry in the branch repair table when the branch was not taken.

    摘要翻译: 一种用于响应于具有分支的分支指令来修复流水线的方法,包括以下步骤:提供具有多个条目的分支修复表,在分支指令的分支修复表中分配条目,存储目标地址, 分支修复表中的条目中的修复信息和修复信息,处理分支指令以确定是否采用分支,以及修复管道,以响应修复信息和条目中的到达地址 分支修复表时未分支。

    Method and apparatus for branch target prediction
    7.
    发明授权
    Method and apparatus for branch target prediction 失效
    分支目标预测方法和装置

    公开(公告)号:US5938761A

    公开(公告)日:1999-08-17

    申请号:US976826

    申请日:1997-11-24

    IPC分类号: G06F9/38 G06F9/32

    CPC分类号: G06F9/3806

    摘要: One embodiment of the present invention provides a method and an apparatus for predicting the target of a branch instruction. This method and apparatus operate by using a translation lookaside buffer (TLB) to store page numbers for predicted branch target addresses. In this embodiment, a branch target address table stores a small index to a location in the translation lookaside buffer, and this index is used retrieve a page number from the location in the translation lookaside buffer. This page number is used as the page number portion of a predicted branch target address. Thus, a small index into a translation lookaside buffer can be stored in a predicted branch target address table instead of a larger page number for the predicted branch target address. This technique effectively reduces the size of a predicted branch target table by eliminating much of the space that is presently wasted storing redundant page numbers. Another embodiment maintains coherence between the branch target address table and the translation lookaside buffer. This makes it possible to detect a miss in the translation lookaside buffer at least one cycle earlier by examining the branch target address table.

    摘要翻译: 本发明的一个实施例提供了一种用于预测分支指令的目标的方法和装置。 该方法和装置通过使用翻译后备缓冲器(TLB)来存储用于预测的分支目标地址的页码。 在本实施例中,分支目标地址表将小索引存储到翻译后备缓冲器中的位置,并且使用该索引从翻译后备缓冲器中的位置检索页码。 该页码用作预测分支目标地址的页码部分。 因此,可以在预测的分支目标地址表中存储向翻译后备缓冲器的小索引,而不是预测的分支目标地址的较大的页码。 该技术通过消除存储冗余页码的目前浪费的大部分空间来有效地减小预测分支目标表的大小。 另一个实施例维护分支目标地址表和转换后备缓冲器之间的一致性。 这使得可以通过检查分支目标地址表来更早地检测翻译后备缓冲区中的未命中至少一个周期。

    Cache memory array which stores two-way set associative data
    8.
    发明授权
    Cache memory array which stores two-way set associative data 失效
    存储双向组关联数据的缓存存储器阵列

    公开(公告)号:US5854761A

    公开(公告)日:1998-12-29

    申请号:US883544

    申请日:1997-06-26

    IPC分类号: G06F12/08 G11C7/00 G11C7/10

    摘要: A cache memory array stores two-way set associative data. An odd set data bank stores odd number sets of the two-way set associative data, where the two ways of each odd number set are aligned horizontally within the odd set data bank. An even set data bank stores even number sets of the two-way set associative data, where the two ways of each even number set are aligned horizontally within the even set data bank. Also, the odd set data bank is aligned horizontally with the even set data bank such that each odd number set is aligned horizontally with a next even number set. The horizontally aligned ways are interleaved for data path width reduction. Set and way selection circuits extract lines of data from the array. The array may be structurally implemented by single-ported RAM cells.

    摘要翻译: 缓存存储器阵列存储双向组关联数据。 奇数组数据组存储奇数组合的双向组关联数据,其中每个奇数组的两种方式在奇数组数据库内水平排列。 偶数集数据库存储偶数集合的双向组关联数据,其中每个偶数集合的两个方式在偶数集数据库内水平对准。 此外,奇数组数据组与偶数组数据组水平对准,使得每个奇数组都与下一个偶数组水平对准。 水平对齐的方式被交织以减少数据路径宽度。 设置和路径选择电路从数组中提取数据行。 阵列可以在结构上由单端口RAM单元实现。

    Using windowed register file to checkpoint register state
    9.
    发明申请
    Using windowed register file to checkpoint register state 审中-公开
    使用窗口寄存器文件进行检查点寄存器状态

    公开(公告)号:US20080016325A1

    公开(公告)日:2008-01-17

    申请号:US11484970

    申请日:2006-07-12

    IPC分类号: G06F9/30

    摘要: In one embodiment, a processor comprises a core configured to execute instructions; a register file comprising a plurality of storage locations; and a window management unit. The window management unit is configured to operate the plurality of storage locations as a plurality of windows, wherein register addresses encoded into the instructions identify storage locations among a subset of the plurality of storage locations that are within a current window. Additionally, the window management unit is configured to allocate a second window in response to a predetermined event. One of the current window and the second window serves as a checkpoint of register state, and the other one of the current window and the second window is updated in response to instructions processed subsequent to the checkpoint. The checkpoint may be restored if the speculative execution results are discarded.

    摘要翻译: 在一个实施例中,处理器包括被配置为执行指令的核心; 包括多个存储位置的寄存器文件; 和窗口管理单元。 窗口管理单元被配置为将多个存储位置操作为多个窗口,其中编码到指令中的寄存器地址识别在当前窗口内的多个存储位置的子集之间的存储位置。 另外,窗口管理单元被配置为响应于预定事件来分配第二窗口。 当前窗口和第二窗口中的一个用作寄存器状态的检查点,并且响应于在检查点之后处理的指令来更新当前窗口和第二窗口中的另一个窗口。 如果抛弃推测执行结果,则可以恢复检查点。