Low Overhead Access to Shared On-Chip Hardware Accelerator With Memory-Based Interfaces
    1.
    发明申请
    Low Overhead Access to Shared On-Chip Hardware Accelerator With Memory-Based Interfaces 有权
    具有基于内存接口的共享片上硬件加速器的低架构访问

    公开(公告)号:US20080222396A1

    公开(公告)日:2008-09-11

    申请号:US11684348

    申请日:2007-03-09

    IPC分类号: G06F9/50

    摘要: In one embodiment, a method is contemplated. Access to a hardware accelerator is requested by a user-privileged thread. Access to the hardware accelerator is granted to the user-privileged thread by a higher-privileged thread responsive to the requesting. One or more commands are communicated to the hardware accelerator by the user-privileged thread without intervention by higher-privileged threads and responsive to the grant of access. The one or more commands cause the hardware accelerator to perform one or more tasks. Computer readable media comprises instructions which, when executed, implement portions of the method are also contemplated in various embodiments, as is a hardware accelerator and a processor coupled to the hardware accelerator.

    摘要翻译: 在一个实施例中,预期了一种方法。 用户特权线程请求访问硬件加速器。 通过响应请求的较高特权线程向硬件加速器的访问授予用户特权线程。 一个或多个命令由用户特权的线程传送到硬件加速器,而不受较高特权线程的干扰,并响应于授权的访问。 一个或多个命令使硬件加速器执行一个或多个任务。 计算机可读介质包括当各种实施例中被执行时实施该方法的部分的指令,以及硬件加速器和耦合到硬件加速器的处理器。

    Low overhead access to shared on-chip hardware accelerator with memory-based interfaces
    2.
    发明授权
    Low overhead access to shared on-chip hardware accelerator with memory-based interfaces 有权
    具有基于内存的接口的共享片上硬件加速器的低开销访问

    公开(公告)号:US07809895B2

    公开(公告)日:2010-10-05

    申请号:US11684348

    申请日:2007-03-09

    IPC分类号: G06F12/14 G06F13/00 G06F15/82

    摘要: In one embodiment, a method is contemplated. Access to a hardware accelerator is requested by a user-privileged thread. Access to the hardware accelerator is granted to the user-privileged thread by a higher-privileged thread responsive to the requesting. One or more commands are communicated to the hardware accelerator by the user-privileged thread without intervention by higher-privileged threads and responsive to the grant of access. The one or more commands cause the hardware accelerator to perform one or more tasks. Computer readable media comprises instructions which, when executed, implement portions of the method are also contemplated in various embodiments, as is a hardware accelerator and a processor coupled to the hardware accelerator.

    摘要翻译: 在一个实施例中,预期了一种方法。 用户特权线程请求访问硬件加速器。 通过响应请求的较高特权线程向硬件加速器的访问授予用户特权线程。 一个或多个命令由用户特权的线程传送到硬件加速器,而不受较高特权线程的干扰,并响应于授权的访问。 一个或多个命令使硬件加速器执行一个或多个任务。 计算机可读介质包括当各种实施例中被执行时实施该方法的部分的指令,以及硬件加速器和耦合到硬件加速器的处理器。

    Efficient On-Chip Accelerator Interfaces to Reduce Software Overhead
    3.
    发明申请
    Efficient On-Chip Accelerator Interfaces to Reduce Software Overhead 有权
    高效的片上加速器接口,以减少软件开销

    公开(公告)号:US20080222383A1

    公开(公告)日:2008-09-11

    申请号:US11684358

    申请日:2007-03-09

    IPC分类号: G06F9/34

    摘要: In one embodiment, a processor comprises execution circuitry and a translation lookaside buffer (TLB) coupled to the execution circuitry. The execution circuitry is configured to execute a store instruction having a data operand; and the execution circuitry is configured to generate a virtual address as part of executing the store instruction. The TLB is coupled to receive the virtual address and configured to translate the virtual address to a first physical address. Additionally, the TLB is coupled to receive the data operand and to translate the data operand to a second physical address. A hardware accelerator is also contemplated in various embodiments, as is a processor coupled to the hardware accelerator, a method, and a computer readable medium storing instruction which, when executed, implement a portion of the method.

    摘要翻译: 在一个实施例中,处理器包括耦合到执行电路的执行电路和转换后备缓冲器(TLB)。 执行电路被配置为执行具有数据操作数的存储指令; 并且所述执行电路被配置为生成作为执行所述存储指令的一部分的虚拟地址。 所述TLB被耦合以接收所述虚拟地址并被配置为将所述虚拟地址转换为第一物理地址。 此外,TLB被耦合以接收数据操作数并将数据操作数转换为第二物理地址。 还可以在各种实施例中考虑硬件加速器,以及耦合到硬件加速器的处理器,方法和存储指令的计算机可读介质,其在执行时实现该方法的一部分。

    Efficient on-chip accelerator interfaces to reduce software overhead
    4.
    发明授权
    Efficient on-chip accelerator interfaces to reduce software overhead 有权
    高效的片上加速器接口,以减少软件开销

    公开(公告)号:US07827383B2

    公开(公告)日:2010-11-02

    申请号:US11684358

    申请日:2007-03-09

    IPC分类号: G06F9/34 G06F12/08

    摘要: In one embodiment, a processor comprises execution circuitry and a translation lookaside buffer (TLB) coupled to the execution circuitry. The execution circuitry is configured to execute a store instruction having a data operand; and the execution circuitry is configured to generate a virtual address as part of executing the store instruction. The TLB is coupled to receive the virtual address and configured to translate the virtual address to a first physical address. Additionally, the TLB is coupled to receive the data operand and to translate the data operand to a second physical address. A hardware accelerator is also contemplated in various embodiments, as is a processor coupled to the hardware accelerator, a method, and a computer readable medium storing instruction which, when executed, implement a portion of the method.

    摘要翻译: 在一个实施例中,处理器包括耦合到执行电路的执行电路和转换后备缓冲器(TLB)。 执行电路被配置为执行具有数据操作数的存储指令; 并且所述执行电路被配置为生成作为执行所述存储指令的一部分的虚拟地址。 所述TLB被耦合以接收所述虚拟地址并被配置为将所述虚拟地址转换为第一物理地址。 此外,TLB被耦合以接收数据操作数并将数据操作数转换为第二物理地址。 还可以在各种实施例中考虑硬件加速器,以及耦合到硬件加速器的处理器,方法和存储指令的计算机可读介质,其在被执行时实现该方法的一部分。

    Missing store operation accelerator
    5.
    发明授权
    Missing store operation accelerator 有权
    缺少商店操作加速器

    公开(公告)号:US07757047B2

    公开(公告)日:2010-07-13

    申请号:US11271056

    申请日:2005-11-12

    IPC分类号: G06F13/16

    CPC分类号: G06F12/0859

    摘要: Maintaining a cache of indications of exclusively-owned coherence state for memory space units (e.g., cache line) allows reduction, if not elimination, of delay from missing store operations. In addition, the indications are maintained without corresponding data of the memory space unit, thus allowing representation of a large memory space with a relatively small missing store operation accelerator. With the missing store operation accelerator, a store operation, which misses in low-latency memory (e.g., L1 or L2 cache), proceeds as if the targeted memory space unit resides in the low-latency memory, if indicated in the missing store operation accelerator. When a store operation misses in low-latency memory and hits in the accelerator, a positive acknowledgement is transmitted to the writing processing unit allowing the store operation to proceed. An entry is allocated for the store operation, the store data is written into the allocated entry, and the target of the store operation is requested from memory. When a copy of the data at the requested memory space unit returns, the rest of the allocated entry is updated.

    摘要翻译: 维护用于存储器空间单元(例如,高速缓存行)的专有相干状态的指示的缓存允许减少(如果不是消除)缺失存储操作的延迟。 此外,在没有存储器空间单元的相应数据的情况下维持指示,从而允许用相对较小的缺少存储操作加速器来表示大的存储空间。 在缺少存储操作加速器的情况下,在低延迟存储器(例如L1或L2高速缓存)中丢失的存储操作如同目标存储器空间单元驻留在低延迟存储器中那样进行,如果在缺少的存储操作 加速器。 当存储操作在低延迟存储器中错过并且在加速器中点击时,肯定确认被发送到写入处理单元,从而允许存储操作继续进行。 为存储操作分配条目,将存储数据写入分配的条目,并且从存储器请求存储操作的目标。 当所请求的存储器空间单元上的数据的副本返回时,所分配的条目的其余部分被更新。

    Execution displacement read-write alias prediction
    6.
    发明授权
    Execution displacement read-write alias prediction 有权
    执行位移读写别名预测

    公开(公告)号:US07434031B1

    公开(公告)日:2008-10-07

    申请号:US10822390

    申请日:2004-04-12

    IPC分类号: G06F9/30 G06F9/40 G06F15/00

    摘要: RAW aliasing can be predicted with register bypassing based at least in part on execution displacement alias prediction. Repeated aliasing between read and write operations (e.g., within a loop), can be reliably predicted based on displacement between the aliasing operations. Performing register bypassing for predicted to alias operations facilitates faster RAW bypassing and mitigates the performance impact of aliasing read operations. The repeated aliasing between operations is tracked along with register information of the aliasing write operations. After exceeding a confidence threshold, an instance of a read operation is predicted to alias with an instance of a write operation in accordance with the previously observed repeated aliasing. Based on displacement between the instances of the operations, the register information of the write operation instance is used to bypass data to the read operation instance.

    摘要翻译: 可以至少部分地基于执行位移别名预测,通过寄存器旁路预测RAW混叠。 可以基于混叠操作之间的位移来可靠地预测读和写操作之间的重复混叠(例如,在循环内)。 执行寄存器旁路以预测别名操作有助于更快的RAW旁路,并减轻混叠读操作的性能影响。 操作之间的重复混叠跟踪混叠写操作的寄存器信息。 在超过置信阈值之后,根据先前观察到的重复混叠,预测读取操作的实例与写入操作的实例相混淆。 基于操作实例之间的位移,写操作实例的寄存器信息用于将数据旁路到读操作实例。

    Software-based technique for improving the effectiveness of prefetching during scout mode
    7.
    发明授权
    Software-based technique for improving the effectiveness of prefetching during scout mode 有权
    基于软件的技术,用于提高侦察模式下预取的有效性

    公开(公告)号:US07373482B1

    公开(公告)日:2008-05-13

    申请号:US11139708

    申请日:2005-05-26

    IPC分类号: G06F9/30 G06F9/40 G06F15/00

    摘要: One embodiment of the present invention provides a system that improves the effectiveness of prefetching during execution of instructions in scout mode. During operation, the system executes program instructions in a normal-execution mode. Upon encountering a condition which causes the processor to enter scout mode, the system performs a checkpoint and commences execution of instructions in scout mode, wherein the instructions are speculatively executed to prefetch future memory operations, but wherein results are not committed to the architectural state of a processor. During execution of a load instruction during scout mode, if the load instruction is a special load instruction and if the load instruction causes a lower-level cache miss, the system waits for data to be returned from a higher-level cache before resuming execution of subsequent instructions in scout mode, instead of disregarding the result of the load instruction and immediately resuming execution in scout mode. In this way, the data returned from the higher-level cache can help in generating addresses for subsequent prefetches during scout mode.

    摘要翻译: 本发明的一个实施例提供了一种提高在侦察模式下执行指令期间预取的有效性的系统。 在运行期间,系统以正常执行模式执行程序指令。 当遇到导致处理器进入侦察模式的情况时,系统执行检查点并开始执行侦察模式中的指令,其中推测性地执行指令以预取将来的存储器操作,但是其中结果未被提交到建筑状态 一个处理器 在侦察模式期间执行加载指令期间,如果加载指令是特殊加载指令,如果加载指令导致较低级别的高速缓存未命中,则系统等待从更高级别的缓存返回数据,然后恢复执行 随后在侦察模式下的指令,而不是忽略加载指令的结果,并立即恢复执行侦察模式。 以这种方式,从高级缓存返回的数据可以帮助在侦察模式期间为后续预取生成地址。

    Efficient caching of stores in scalable chip multi-threaded systems
    8.
    发明授权
    Efficient caching of stores in scalable chip multi-threaded systems 有权
    在可扩展芯片多线程系统中高效缓存存储

    公开(公告)号:US07793044B1

    公开(公告)日:2010-09-07

    申请号:US11654150

    申请日:2007-01-16

    IPC分类号: G06F13/00 G06F13/28

    CPC分类号: G06F12/0811 G06F12/084

    摘要: In accordance with one embodiment, an enhanced chip multiprocessor permits an L1 cache to request ownership of a data line from a shared L2 cache. A determination is made whether to deny or grant the request for ownership based on the sharing of the data line. In one embodiment, the sharing of the data line is determined from an enhanced L2 cache directory entry associated with the data line. If ownership of the data line is granted, the current data line is passed from the shared L2 to the requesting L1 cache and an associated enhanced L1 cache directory entry and the enhanced L2 cache directory entry are updated to reflect the L1 cache ownership of the data line. Consequently, updates of the data line by the L1 cache do not go through the shared L2 cache, thus reducing transaction pressure on the shared L2 cache.

    摘要翻译: 根据一个实施例,增强型芯片多处理器允许L1高速缓存从共享L2高速缓存请求数据线的所有权。 确定是否根据数据线的共享来拒绝或授予所有权请求。 在一个实施例中,从与数据线相关联的增强的L2高速缓存目录条目确定数据线的共享。 如果数据线的所有权被授予,则当前数据行从共享L2传递到请求的L1高速缓存,并且相关联的增强的L1高速缓存目录条目和增强的L2高速缓存目录条目被更新以反映数据的L1高速缓存所有权 线。 因此,L1高速缓存的数据线的更新不会通过共享的L2高速缓存,从而降低共享L2高速缓存上的事务压力。

    Efficient on-chip instruction and data caching for chip multiprocessors
    9.
    发明授权
    Efficient on-chip instruction and data caching for chip multiprocessors 有权
    芯片多处理器的高效片上指令和数据缓存

    公开(公告)号:US07543112B1

    公开(公告)日:2009-06-02

    申请号:US11472141

    申请日:2006-06-20

    IPC分类号: G06F13/00 G06F13/28

    CPC分类号: G06F12/0897 G06F12/084

    摘要: The storage of data line in one or more L1 caches and/or a shared L2 cache of a chip multiprocessor is dynamically optimized based on the sharing of the data line. In one embodiment, an enhanced L2 cache directory entry associated with the data line is generated in an L2 cache directory of the shared L2 cache. The enhanced L2 cache directory entry includes a cache mask indicating a storage state of the data line in the one or more L1 caches and the shared L2 cache. In some embodiments, where the data line is stored in the shared L2 cache only, a portion of the cache mask indicates a storage history of the data line in the one or more L2 caches.

    摘要翻译: 基于数据线的共享,数据线在码片多处理器的一个或多个L1高速缓存和/或共享L2高速缓存中的存储被动态优化。 在一个实施例中,在共享L2高速缓存的L2高速缓存目录中生成与数据线相关联的增强型L2高速缓存目录条目。 增强的L2高速缓存目录条目包括指示在一个或多个L1高速缓存和共享L2高速缓存中的数据线的存储状态的高速缓存掩码。 在一些实施例中,其中数据线仅存储在共享L2高速缓存中,高速缓存掩码的一部分指示一个或多个L2高速缓存中的数据线的存储历史。

    Hardware-based technique for improving the effectiveness of prefetching during scout mode
    10.
    发明授权
    Hardware-based technique for improving the effectiveness of prefetching during scout mode 有权
    基于硬件的技术,用于提高侦察模式下预取的有效性

    公开(公告)号:US07529911B1

    公开(公告)日:2009-05-05

    申请号:US11139866

    申请日:2005-05-26

    IPC分类号: G06F9/30 G06F9/40 G06F15/00

    摘要: One embodiment of the present invention provides a system that improves the effectiveness of prefetching during execution of instructions in scout mode. Upon encountering a non-data dependent stall condition, the system performs a checkpoint and commences execution of instructions in scout mode, wherein instructions are speculatively executed to prefetch future memory operations, but wherein results are not committed to the architectural state of a processor. When the system executes a load instruction during scout mode, if the load instruction causes a lower-level cache miss, the system allows the load instruction to access a higher-level cache. Next, the system places the load instruction and subsequent dependent instructions into a deferred queue, and resumes execution of the program in scout mode. If the load instruction ultimately causes a hit in the higher-level cache, the system replays the load instruction and subsequent dependent instructions in the deferred queue, whereby the value retrieved from the higher-level cache can help in generating prefetches during scout mode.

    摘要翻译: 本发明的一个实施例提供一种提高在侦察模式下执行指令期间预取的有效性的系统。 在遇到非数据相关失速条件时,系统执行检查点并开始执行侦察模式中的指令,其中推测性地执行指令以预取未来的存储器操作,但是其中结果未被提交到处理器的架构状态。 当系统在侦察模式下执行加载指令时,如果加载指令导致较低级别的高速缓存未命中,则系统允许加载指令访问更高级别的缓存。 接下来,系统将加载指令和后续相关指令放入延迟队列中,并以侦察模式恢复执行程序。 如果加载指令最终导致高级缓存中的命中,则系统在延迟队列中重放加载指令和后续相关指令,由此从较高级别缓存中检索的值可以帮助在侦察模式下产生预取。