Method and apparatus for compressing VLIW instruction and sharing subinstructions
    1.
    发明授权
    Method and apparatus for compressing VLIW instruction and sharing subinstructions 有权
    用于压缩VLIW指令和共享子指令的方法和装置

    公开(公告)号:US07409530B2

    公开(公告)日:2008-08-05

    申请号:US11015717

    申请日:2004-12-17

    IPC分类号: G06F9/00

    摘要: A VLIW instruction format is introduced having a set of control bits which identify subinstruction sharing conditions. At compilation the VLIW instruction is analyzed to identify subinstruction sharing opportunities. Such opportunities are encoded in the control bits of the instruction. Before the instruction is moved into the instruction cache, the instruction is compressed into the new format to delete select redundant occurrences of a subinstruction. Specifically, where a subinstruction is to be shared by corresponding functional processing units of respective clusters, the subinstruction need only appear in the instruction once. The redundant appearance is deleted. The control bits are decoded at instruction parsing time to route a shared subinstruction to the associated functional processing units.

    摘要翻译: 引入VLIW指令格式,其具有识别子指令共享条件的一组控制位。 在编译时,分析VLIW指令以识别子建议共享机会。 这样的机会被编码在指令的控制位中。 在指令移入指令高速缓存之前,指令被压缩成新格式,以删除选择冗余的子指令。 具体地说,在相应簇的相应功能处理单元要共享子指令的情况下,子指令只需要在指令中出现一次。 冗余外观被删除。 控制位在指令解析时被解码,以将共享子指令路由到相关联的功能处理单元。

    Method and apparatus for compressing VLIW instruction and sharing subinstructions
    2.
    发明授权
    Method and apparatus for compressing VLIW instruction and sharing subinstructions 有权
    用于压缩VLIW指令和共享子指令的方法和装置

    公开(公告)号:US06859870B1

    公开(公告)日:2005-02-22

    申请号:US09519695

    申请日:2000-03-07

    摘要: A VLIW instruction format is introduced having a set of control bits which identify subinstruction sharing conditions. At compilation the VLIW instruction is analyzed to identify subinstruction sharing opportunities. Such opportunities are encoded in the control bits of the instruction. Before the instruction is moved into the instruction cache, the instruction is compressed into the new format to delete select redundant occurrences of a subinstruction. Specifically, where a subinstruction is to be shared by corresponding functional processing units of respective clusters, the subinstruction need only appear in the instruction once. The redundant appearance is deleted. The control bits are decoded at instruction parsing time to route a shared subinstruction to the associated functional processing units.

    摘要翻译: 引入VLIW指令格式,其具有识别子指令共享条件的一组控制位。 在编译时,分析VLIW指令以识别子建议共享机会。 这样的机会被编码在指令的控制位中。 在指令移入指令高速缓存之前,指令被压缩成新格式,以删除选择冗余的子指令。 具体地说,在相应簇的相应功能处理单元要共享子指令的情况下,子指令只需要在指令中出现一次。 冗余外观被删除。 控制位在指令解析时被解码,以将共享子指令路由到相关联的功能处理单元。

    Operand queues for streaming data: A processor register file extension
    3.
    发明授权
    Operand queues for streaming data: A processor register file extension 有权
    流数据操作数队列:处理器寄存器文件扩展名

    公开(公告)号:US06782470B1

    公开(公告)日:2004-08-24

    申请号:US09706899

    申请日:2000-11-06

    IPC分类号: G06F934

    摘要: The register file of a processor includes embedded operand queues. The configuration of the register file into registers and operand queues is defined dynamically by a computer program. The programmer determines the trade-off between the number and size of the operand queue(s) versus the number of registers used for the program. The programmer partitions a portion of the registers into one or more operand queues. A given queue occupies a consecutive set of registers, although multiple queues need not occupy consecutive registers. An additional address bit is included to distinguish operand queue addresses from register addresses. Queue state logic tracks status information for each queue, including a header pointer, tail pointer, start address, end address and number of vacancies value. The program sets the locations and depth of a given operand queue within the register file.

    摘要翻译: 处理器的寄存器文件包括嵌入的操作数队列。 寄存器文件到寄存器和操作数队列中的配置由计算机程序动态定义。 程序员确定操作数队列的数量和大小与程序使用的寄存器数之间的权衡。 编程器将一部分寄存器分成一个或多个操作数队列。 给定的队列占用连续的一组寄存器,尽管多个队列不需要占用连续的寄存器。 包括一个额外的地址位来区分操作数队列地址与寄存器地址。 队列状态逻辑跟踪每个队列的状态信息,包括头指针,尾指针,起始地址,结束地址和空位数值。 程序设置寄存器文件中给定操作数队列的位置和深度。

    Multimedia instruction set for wide data paths
    4.
    发明授权
    Multimedia instruction set for wide data paths 有权
    用于宽数据路径的多媒体指令集

    公开(公告)号:US06675286B1

    公开(公告)日:2004-01-06

    申请号:US09561406

    申请日:2000-04-27

    IPC分类号: G06F9302

    摘要: Partitioned sigma instructions are provided in which processor capacity is effectively distributed among multiple sigma operations which are executed concurrently. Special registers are included for aligning data on memory word boundaries to reduce packing overhead in providing long data words for multimedia instructions which implement shifting data sequences over multiple iterations. Extended partitioned arithmetic instructions are provided to improve precision and avoid accumulated carry over errors. Partitioned formatting instructions, including partitioned interleave, partitioned compress, and partitioned interleave and compress pack subwords in an effective order for other partitioned operations.

    摘要翻译: 提供了分区的sigma指令,其中处理器容量有效地分布在同时执行的多个sigma操作之间。 包括特殊寄存器用于对齐存储器字边界上的数据,以减少在为多个重复执行移位数据序列的多媒体指令提供长数据字时的打包开销。 提供扩展分区算术指令以提高精度并避免累积的转移错误。 分区格式化指令,包括分区交织,分区压缩,分区交错和压缩包子字,以有效的顺序进行其他分区操作。

    Processor with register file accessible by row column to achieve data array transposition
    6.
    发明授权
    Processor with register file accessible by row column to achieve data array transposition 有权
    处理器具有可通过行列访问的寄存器文件,以实现数据数组转置

    公开(公告)号:US06804771B1

    公开(公告)日:2004-10-12

    申请号:US09626263

    申请日:2000-07-25

    IPC分类号: G06F1200

    摘要: A processor including a transposable register file. The register file allows normal row-wise access to data and also allows a transposed column-wise access to data stored in a column among registers of the register file. In transposed access mode, a data operand is accessed in a given partition of each of n registers. One register stores a first partition. An adjacent register stores the second partition, and so forth for each of n partitions of the operand. A queue-based transposable register file also is implemented. The queue-based transposable register file includes a head pointer and a tail pointer and has a virtual register. Data written into the virtual register is written into one of the registers as selected by the head pointer. Data read from the virtual register is read from one of the registers as selected by the tail pointer.

    摘要翻译: 包括可转位寄存器文件的处理器。 寄存器文件允许对数据进行正常的逐行访问,并且还允许对存储在寄存器文件的寄存器中的列中的数据进行逐行的访问。 在转置访问模式下,在每个n个寄存器的给定分区中访问数据操作数。 一个寄存器存储第一个分区。 相邻的寄存器存储操作数的n个分区中的每一个的第二分区等等。 还实现了基于队列的可转位寄存器文件。 基于队列的可转位寄存器文件包括头指针和尾指针,并具有虚拟寄存器。 写入虚拟寄存器的数据被写入由头指针选择的寄存器之一。 从虚拟寄存器读取的数据从尾部指针选择的一个寄存器中读取。

    Method and apparatus for processing compressed VLIW subinstruction opcodes
    7.
    发明授权
    Method and apparatus for processing compressed VLIW subinstruction opcodes 有权
    用于处理压缩VLIW子指令操作码的方法和装置

    公开(公告)号:US06779101B1

    公开(公告)日:2004-08-17

    申请号:US09520754

    申请日:2000-03-07

    IPC分类号: G06F1500

    摘要: An area of on-chip memory is allocated to store one or more tables of commonly-used opcodes. The normal opcode in the instruction is replaced with a shorter code identifying an index into the table. As a result, the instruction is compressed. For a VLIW architecture, in which an instruction includes multiple subinstructions (multiple opcodes), the instruction loading bandwidth is substantially reduced. Preferably, an opcode table is dynamically loaded. Different tasks are programmed with a respective table of opcodes to be stored in the opcode table. The respective table is loaded when task switching. A smaller, dynamic opcode table provides an effective selection and a low table loading overhead

    摘要翻译: 分配片上存储器的区域以存储一个或多个常用操作码表。 指令中的正常操作码将用较短的代码标识表中的索引。 结果,指令被压缩。 对于其中指令包含多个子指令(多个操作码)的VLIW架构,大大减少了指令加载带宽。 优选地,操作码表被动态加载。 不同的任务用相应的操作码表编程,以存储在操作码表中。 任务切换时加载相应的表。 较小的动态操作码表提供了有效的选择和低的表加载开销

    Template data transfer coprocessor
    8.
    发明授权
    Template data transfer coprocessor 失效
    模板数据传输协处理器

    公开(公告)号:US06785743B1

    公开(公告)日:2004-08-31

    申请号:US09533047

    申请日:2000-03-22

    IPC分类号: G06F934

    摘要: The template data transfer coprocessor (TDTP) offloads block data transfer operations from a mediaprocessor. A uni-block template, program-guided template, an indirect template and queue-based template are described. The TDTP includes a template interpreter that employs an event-driven control mechanism to set up a template and compute block information and block information for each template. The programming involved in defining block data transfers for video and image processing algorithms is substantially reduced by the use of these templates.

    摘要翻译: 模板数据传输协处理器(TDTP)从媒体处理器卸载块数据传输操作。 描述了单块模板,程序引导模板,间接模板和基于队列的模板。 TDTP包括一个模板解释器,它使用事件驱动控制机制来建立一个模板,并计算每个模板的块信息和块信息。 通过使用这些模板,大大减少了为视频和图像处理算法定义块数据传输所涉及的编程。

    Program-directed cache prefetching for media processors
    9.
    发明授权
    Program-directed cache prefetching for media processors 有权
    针对媒体处理器的程序导向缓存预取

    公开(公告)号:US07234040B2

    公开(公告)日:2007-06-19

    申请号:US10895232

    申请日:2004-07-20

    IPC分类号: G06F9/26

    摘要: Data are prefetched into a cache from a prefetch region of memory, based on a program instruction reference and on compile-time information that indicates the bounds of the prefetch region, a size of a prefetch block, and a location of the prefetch block. If the program reference address lies with the prefetch region, an offset distance is used to determine the address of the prefetch block. Prefetching is performed either from a continuous one-dimensional prefetch region, or an embedded multi-dimensional prefetch region. The prefetch block address is respectively determined in one dimension or multiple dimensions. Program-directed prefetching is implemented by a media processor or by a separate processing component in communication with the media processor. The primary components include a program-directed prefetch controller, a cache, a function unit, and a memory. Preferably, region registers store the compile-time information, and the prefetched data are stored in a cache prefetch buffer.

    摘要翻译: 基于程序指令引用和指示预取区域的范围,预取块的大小和预取块的位置的编译时信息,将数据从存储器的预取区域预取到高速缓存中。 如果程序参考地址位于预取区域,则使用偏移距离来确定预取块的地址。 预取是从连续的一维预取区域或嵌入的多维预取区域执行的。 预取块地址分别在一维或多维中确定。 程序导向的预取由媒体处理器或与媒体处理器通信的单独的处理组件实现。 主要组件包括面向程序的预取控制器,高速缓存,功能单元和存储器。 优选地,区域寄存器存储编译时信息,并且将预取的数据存储在高速缓存预取缓冲器中。

    Multi-ported memory having pipelined data banks
    10.
    发明授权
    Multi-ported memory having pipelined data banks 有权
    具有流水线数据库的多端口存储器

    公开(公告)号:US06732247B2

    公开(公告)日:2004-05-04

    申请号:US09764250

    申请日:2001-01-17

    IPC分类号: G06F1300

    摘要: Multi-ported pipelined memory is located on a processor die serving as an addressable on-chip memory for efficiently processing streaming data. The memory sustains multiple wide memory accesses per cycle, clocks synchronously with the rest of the processor, and stores a significant portion of an image. Such memory bypasses the register file directly providing data to the processor's functional units. The memory includes multiple memory banks which permit multiple memory accesses per cycle. The memory banks are connected in pipelined fashion to pipeline registers placed at regular intervals on a global bus. The memory sustains multiple transactions per cycle, at a larger memory density than that of a multi-ported static memory, such as a register file.

    摘要翻译: 多端口流水线存储器位于作为可寻址片上存储器的处理器管芯上,用于有效地处理流数据。 存储器每个周期维持多个宽的存储器访问,与处理器的其余部分同步地进行存储,并存储图像的大部分。 这种存储器绕过寄存器文件直接向处理器的功能单元提供数据。 存储器包括允许每个周期多个存储器访问的多个存储体。 存储体以流水线方式连接到在全局总线上以规则间隔放置的流水线寄存器。 存储器在每个周期维持多个事务,其存储密度大于多端口静态存储器(例如寄存器文件)的存储器密度。