MICROPROCESSOR HAVING AT LEAST ONE APPLICATION SPECIFIC FUNCTIONAL UNIT AND METHOD TO DESIGN SAME
    1.
    发明申请
    MICROPROCESSOR HAVING AT LEAST ONE APPLICATION SPECIFIC FUNCTIONAL UNIT AND METHOD TO DESIGN SAME 审中-公开
    具有至少一个应用特定功能单元的微处理器及其设计方法

    公开(公告)号:US20110055521A1

    公开(公告)日:2011-03-03

    申请号:US12311177

    申请日:2007-09-24

    IPC分类号: G06F9/30

    摘要: Customisable embedded processors that are available on the market make it possible for designers to speed up execution of applications by using Application-specific Functional Units (AFUs), implementing Instruction-Set Extensions (ISEs). Furthermore, techniques for automatic ISE identification have been improving; many algorithms have been proposed for choosing, given the application's source code, the best ISEs under various constraints. Read and write ports between the AFUs and the processor register file are an expensive asset, fixed in the micro-architecture—some processors indeed only allow two read ports and one write port—and yet, on the other hand, a large availability of inputs and outputs to and from the AFUs exposes high speedup. Here we present a solution to the limitation of actual register file ports by serialising register file access and therefore addressing multi-cycle read and write. It does so in an innovative way for two reasons: (1) it exploits and brings forward the progress in ISE identification under constraint, and (2) it combines register file access serialisation with pipelining in order to obtain the best global solution. Our method consists of scheduling graphs—corresponding to ISEs—under input/output constraint

    摘要翻译: 市场上可用的可定制的嵌入式处理器使设计人员能够通过使用特定于应用的功能单元(AFU),实现指令集扩展(ISE)来加快应用程序的执行。 此外,用于自动ISE识别的技术已经改进; 考虑到应用程序的源代码,已经提出了许多算法来选择在各种约束条件下最好的ISE。 在AFU和处理器寄存器文件之间读取和写入端口是一种昂贵的资产,固定在微架构中 - 一些处理器确实只允许两个读端口和一个写端口,另一方面,输入的可用性很大 并且从AFU输出到高加速度。 在这里,我们提出了通过串行化寄存器文件访问来限制实际寄存器文件端口的解决方案,因此寻址多周期读写。 它以创新的方式做到这一点有两个原因:(1)利用并提出了ISE识别在约束条件下的进展,(2)将注册文件访问序列化与流水线结合在一起,以获得最佳的全局解决方案。 我们的方法包括对输入/输出约束下的ISE进行调度图

    Automated instruction-set extension
    2.
    发明授权
    Automated instruction-set extension 失效
    自动指令集扩展

    公开(公告)号:US07685587B2

    公开(公告)日:2010-03-23

    申请号:US10716907

    申请日:2003-11-19

    IPC分类号: G06F9/44

    CPC分类号: G06F8/433

    摘要: Commercial data processors are available that include a capability of extending their instruction set for a specified application, i.e. of introducing customized functional units in the interest of enhanced processing performance. For such processors there is a need for automatically forming the extensions from high-level application code. A technique is described for selecting maximal-speedup convex subgraphs of the application dataflow graph under micro-architectural constraints.

    摘要翻译: 商业数据处理器是可用的,其包括扩展其指定应用的指令集的能力,即为了增强的处理性能引入定制的功能单元。 对于这样的处理器,需要从高级应用代码自动形成扩展。 描述了一种用于在微架构约束下选择应用数据流图的最大加速凸子图的技术。

    Virtual memory window with dynamic prefetching support
    3.
    发明申请
    Virtual memory window with dynamic prefetching support 有权
    具有动态预取支持的虚拟内存窗口

    公开(公告)号:US20100005272A1

    公开(公告)日:2010-01-07

    申请号:US11578830

    申请日:2005-04-19

    IPC分类号: G06F13/28 G06F12/02

    CPC分类号: G06F12/1081 G06F9/3877

    摘要: Reconfigurable Systems-an-Chip (RSoCs) on the market consist of full-fledged processors and large Field-Programmable Gate Arrays (FPGAs). The latter can be used to implement the system glue logic, various peripherals, and application-specific coprocessors. Using FPGAs for application-specific coprocessors has certain speedup potentials, but it is less present in practice because of the complexity of interfacing the software application with the coprocessor. In the present application, we present a virtualisation layer consisting of an operating system extension and a hardware component. It lowers the complexity of interfacing and increases portability potentials, while it also allows the coprocessor to access the user virtual memory through a virtual memory window. The burden of moving data between processor and coprocessor is shifted from the programmer to the operating system.

    摘要翻译: 市场上可重构的系统芯片(RSoC)由完整的处理器和大型现场可编程门阵列(FPGA)组成。 后者可用于实现系统胶合逻辑,各种外设和特定于应用的协处理器。 对于特定于应用程序的协处理器使用FPGA具有一定的加速电位,但由于将软件应用与协处理器连接的复杂性在实践中较少存在。 在本应用中,我们提出了一个由操作系统扩展和硬件组件组成的虚拟化层。 它降低了接口的复杂性并增加了可移植性的潜力,同时它还允许协处理器通过虚拟存储器窗口访问用户虚拟内存。 在处理器和协处理器之间移动数据的负担从编程器转移到操作系统。

    Automatic identification of application-specific functional units with architecturally visible storage
    4.
    发明授权
    Automatic identification of application-specific functional units with architecturally visible storage 有权
    自动识别具有架构可见存储的应用程序特定功能单元

    公开(公告)号:US08166467B2

    公开(公告)日:2012-04-24

    申请号:US11651988

    申请日:2007-01-11

    IPC分类号: G06F9/45

    CPC分类号: G06F8/4442

    摘要: Instruction Set Extensions (ISEs) can be used effectively to accelerate the performance of embedded processors. The critical, and difficult task of ISE selection is often performed manually by designers. A few automatic methods for ISE generation have shown good capabilities, but are still limited in the handling of memory accesses, and so they fail to directly address the memory wall problem.We present here the first ISE identification technique that can automatically identify state-holding Application-specific Functional Units (AFUs) comprehensively, thus being able to eliminate a large portion of memory traffic from cache and main memory. Our cycle-accurate results obtained by the SimpleScalar simulator show that the identified AFUs with architecturally visible storage gain significantly more than previous techniques, and achieve an average speedup of 2.8× over pure software execution. Moreover, the number of required memory-access instructions is reduced by two thirds on average, suggesting corresponding benefits on energy consumption.

    摘要翻译: 指令集扩展(ISE)可以有效地用于加速嵌入式处理器的性能。 ISE选择的关键和困难任务通常由设计人员手动执行。 用于ISE生成的几种自动方法显示出良好的功能,但是在处理存储器访问方面仍然受到限制,因此它们不能直接解决内存墙问题。 我们在这里介绍第一个可以自动识别状态保持应用程序特定功能单元(AFU)的ISE识别技术,从而能够消除高速缓存和主内存中大量的内存流量。 我们通过SimpleScalar模拟器获得的周期精确的结果表明,具有架构可见存储增益的识别AFU明显高于以前的技术,并且在纯软件执行上平均加速为2.8倍。 此外,所需的内存访问指令的数量平均减少三分之二,这表明了对能耗的相应益处。

    Automatic identification of application-specific functional units with architecturally visible storage
    5.
    发明申请
    Automatic identification of application-specific functional units with architecturally visible storage 有权
    自动识别具有架构可见存储的应用程序特定功能单元

    公开(公告)号:US20070162900A1

    公开(公告)日:2007-07-12

    申请号:US11651988

    申请日:2007-01-11

    IPC分类号: G06F9/45

    CPC分类号: G06F8/4442

    摘要: Instruction Set Extensions (ISEs) can be used effectively to accelerate the performance of embedded processors. The critical, and difficult task of ISE selection is often performed manually by designers. A few automatic methods for ISE generation have shown good capabilities, but are still limited in the handling of memory accesses, and so they fail to directly address the memory wall problem. We present here the first ISE identification technique that can automatically identify state-holding Application-specific Functional Units (AFUs) comprehensively, thus being able to eliminate a large portion of memory traffic from cache and main memory. Our cycle-accurate results obtained by the SimpleScalar simulator show that the identified AFUs with architecturally visible storage gain significantly more than previous techniques, and achieve an average speedup of 2.8× over pure software execution. Moreover, the number of required memory-access instructions is reduced by two thirds on average, suggesting corresponding benefits on energy consumption.

    摘要翻译: 指令集扩展(ISE)可以有效地用于加速嵌入式处理器的性能。 ISE选择的关键和困难任务通常由设计人员手动执行。 用于ISE生成的几种自动方法显示出良好的功能,但是在处理存储器访问方面仍然受到限制,因此它们不能直接解决内存墙问题。 我们在这里介绍第一个可以自动识别状态保持应用程序特定功能单元(AFU)的ISE识别技术,从而可以消除高速缓存和主内存中大量的内存流量。 我们通过SimpleScalar模拟器获得的循环准确结果表明,具有架构可见存储增益的识别AFU明显高于以前的技术,与纯软件执行相比,平均加速速度为2.8倍。 此外,所需的内存访问指令的数量平均减少三分之二,这表明了对能耗的相应益处。

    Virtual memory window with dynamic prefetching support
    6.
    发明授权
    Virtual memory window with dynamic prefetching support 有权
    具有动态预取支持的虚拟内存窗口

    公开(公告)号:US08185696B2

    公开(公告)日:2012-05-22

    申请号:US11578830

    申请日:2005-04-19

    IPC分类号: G06F12/08

    CPC分类号: G06F12/1081 G06F9/3877

    摘要: Reconfigurable Systems-an-Chip (RSoCs) on the market consist of full-fledged processors and large Field-Programmable Gate Arrays (FPGAs). The latter can be used to implement the system glue logic, various peripherals, and application-specific coprocessors. Using FPGAs for application-specific coprocessors has certain speedup potentials, but it is less present in practice because of the complexity of interfacing the software application with the coprocessor. In the present application, we present a virtualisation layer consisting of an operating system extension and a hardware component. It lowers the complexity of interfacing and increases portability potentials, while it also allows the coprocessor to access the user virtual memory through a virtual memory window. The burden of moving data between processor and coprocessor is shifted from the programmer to the operating system.

    摘要翻译: 市场上可重构的系统芯片(RSoC)由完整的处理器和大型现场可编程门阵列(FPGA)组成。 后者可用于实现系统胶合逻辑,各种外设和特定于应用的协处理器。 对于特定于应用程序的协处理器使用FPGA具有一定的加速电位,但由于将软件应用与协处理器连接的复杂性在实践中较少存在。 在本应用中,我们提出了一个由操作系统扩展和硬件组件组成的虚拟化层。 它降低了接口的复杂性并增加了可移植性的潜力,同时它还允许协处理器通过虚拟存储器窗口访问用户虚拟内存。 在处理器和协处理器之间移动数据的负担从编程器转移到操作系统。

    Automated instruction-set extension
    7.
    发明申请
    Automated instruction-set extension 失效
    自动指令集扩展

    公开(公告)号:US20070162902A1

    公开(公告)日:2007-07-12

    申请号:US10716907

    申请日:2003-11-19

    IPC分类号: G06F9/45

    CPC分类号: G06F8/433

    摘要: Commercial data processors are available that include a capability of extending their instruction set for a specified application, i.e. of introducing customized functional units in the interest of enhanced processing performance. For such processors there is a need for automatically forming the extensions from high-level application code. A technique is described for selecting maximal-speedup convex subgraphs of the application dataflow graph under micro-architectural constraints.

    摘要翻译: 商业数据处理器是可用的,其包括扩展其指定应用的指令集的能力,即为了增强的处理性能引入定制的功能单元。 对于这样的处理器,需要从高级应用代码自动形成扩展。 描述了一种用于在微架构约束下选择应用数据流图的最大加速凸子图的技术。