MIXED-WIDTH SIMD OPERATIONS HAVING EVEN-ELEMENT AND ODD-ELEMENT OPERATIONS USING REGISTER PAIR FOR WIDE DATA ELEMENTS
    13.
    发明申请
    MIXED-WIDTH SIMD OPERATIONS HAVING EVEN-ELEMENT AND ODD-ELEMENT OPERATIONS USING REGISTER PAIR FOR WIDE DATA ELEMENTS 审中-公开
    使用寄存器对进行数据元素的混合宽度SIMD操作具有即时元素和空白元素操作

    公开(公告)号:WO2017014892A1

    公开(公告)日:2017-01-26

    申请号:PCT/US2016/038487

    申请日:2016-06-21

    Abstract: Systems and methods relate to a mixed-width single instruction multiple data (SIMD) instruction which has at least a source vector operand comprising data elements of a first bit-width and a destination vector operand comprising data elements of a second bit-width, wherein the second bit-width is either half of or twice the first bit-width. Correspondingly, one of the source or destination vector operands is expressed as a pair of registers, a first register and a second register. The other vector operand is expressed as a single register. Data elements of the first register correspond to even-numbered data elements of the other vector operand expressed as a single register, and data elements of the second register correspond to data elements of the other vector operand expressed as a single register.

    Abstract translation: 系统和方法涉及混合宽度单指令多数据(SIMD)指令,其具有至少包括第一位宽的数据元素和包含第二位宽的数据元素的目的地向量操作数的源向量操作数,其中 第二个位宽是第一个位宽的一半或两倍。 相应地,源或目标向量操作数之一被表示为一对寄存器,第一寄存器和第二寄存器。 另一个向量操作数表示为单个寄存器。 第一寄存器的数据元素对应于表示为单个寄存器的另一向量操作数的偶数数据元,第二寄存器的数据元对应于表示为单个寄存器的另一向量操作数的数据元。

    PARALLELIZATION OF SCALAR OPERATIONS BY VECTOR PROCESSORS USING DATA-INDEXED ACCUMULATORS IN VECTOR REGISTER FILES, AND RELATED CIRCUITS, METHODS, AND COMPUTER-READABLE MEDIA
    14.
    发明申请
    PARALLELIZATION OF SCALAR OPERATIONS BY VECTOR PROCESSORS USING DATA-INDEXED ACCUMULATORS IN VECTOR REGISTER FILES, AND RELATED CIRCUITS, METHODS, AND COMPUTER-READABLE MEDIA 审中-公开
    使用矢量寄存器文件中的数据索引累加器的矢量处理器和相关电路,方法和计算机可读介质的标量运算的并行化

    公开(公告)号:WO2016014213A1

    公开(公告)日:2016-01-28

    申请号:PCT/US2015/038013

    申请日:2015-06-26

    Abstract: Parallelization of scalar operations by vector processors using data-indexed accumulators in vector register files, related circuits, methods, and computer-readable media are disclosed. In one aspect, a vector processor comprises a vector register file providing a plurality of write ports and a plurality of vector registers each providing a plurality of accumulators. The vector processor receives an input data vector. For each of the plurality of write ports, the vector processor executes vector operation(s) for accessing an input data value of the input data vector, and determining, based on the input data value, a register index for a vector register among the plurality of vector registers, and an accumulator index for an accumulator among the plurality of accumulators of the vector register. Based on the register index, a register value is retrieved from the register index, and a scalar operation is performed based on the register value and the accumulator index.

    Abstract translation: 公开了使用向量寄存器文件,相关电路,方法和计算机可读介质中的数据索引累加器的矢量处理器的标量运算的并行化。 一方面,向量处理器包括提供多个写入端口的向量寄存器文件和多个向量寄存器,每个向量寄存器提供多个累加器。 向量处理器接收输入数据向量。 对于多个写入端口中的每一个,向量处理器执行用于访问输入数据向量的输入数据值的向量操作,并且基于输入数据值,确定多个写入端口中的向量寄存器的寄存器索引 矢量寄存器的多个累加器中的累加器的累加器索引。 基于寄存器索引,从寄存器索引检索寄存器值,并且基于寄存器值和累加器索引执行标量运算。

    FREEING PHYSICAL REGISTERS IN A MICROPROCESSOR
    15.
    发明申请
    FREEING PHYSICAL REGISTERS IN A MICROPROCESSOR 审中-公开
    在微处理器中释放物理寄存器

    公开(公告)号:WO2015142435A1

    公开(公告)日:2015-09-24

    申请号:PCT/US2015/014541

    申请日:2015-02-05

    Abstract: Physical register scrubbing in computer microprocessors. Most instructions in a computer program produce some output value that is destined for one or more architected registers. These architected destination registers are renamed, in the processor pipeline, to physical registers in order to improve performance by exposing more instruction level parallelism to the processor. In one aspect, a method comprises identifying, in a reorder buffer, a first instruction and a second instruction, without intervening potential pipeline flushers, that write to the same architected destination register, in order to free the physical register corresponding to the older of the two instructions.

    Abstract translation: 计算机微处理器中的物理寄存器擦除。 计算机程序中的大多数指令产生一些输出值,用于一个或多个架构化寄存器。 这些架构化的目标寄存器在处理器流水线中被重命名为物理寄存器,以便通过向处理器暴露更多的指令级并行性来提高性能。 在一个方面,一种方法包括在重排序缓冲器中识别第一指令和第二指令,而不间断地写入到同一架构目的寄存器的潜在流水线冲洗器,以便释放对应于较早的 两个说明。

    A DATA PROCESSING APPARATUS AND METHOD FOR PERFORMING SEGMENTED OPERATIONS
    16.
    发明申请
    A DATA PROCESSING APPARATUS AND METHOD FOR PERFORMING SEGMENTED OPERATIONS 审中-公开
    一种数据处理装置和执行分离操作的方法

    公开(公告)号:WO2015118299A1

    公开(公告)日:2015-08-13

    申请号:PCT/GB2015/050132

    申请日:2015-01-21

    Applicant: ARM LIMITED

    Abstract: A data processing apparatus and method are provided for performing segmented operations. The data processing apparatus comprises a vector register store for storing vector operands, and vector processing circuitry providing N lanes of parallel processing, and arranged to perform a segmented operation on up to N data elements provided by a specified vector operand, each data element being allocated to one of the N lanes. The up to N data elements forms a plurality of segments, and performance of the segmented operation comprises performing a separate operation on the data elements of each segment, the separate operation involving interaction between the lanes containing the data elements of the associated segment. Predicate generation circuitry is responsive to a compute descriptor instruction specifying an input vector operand comprising a plurality of segment descriptors, to generate per lane predicate information used by the vector processing circuitry when performing the segmented operation to maintain a boundary between each of the plurality of segments. As a result, interaction between lanes containing data elements from different segments is prevented. This allows very effective utilisation of the lanes of parallel processing within the vector processing circuitry to be achieved.

    Abstract translation: 提供了一种用于执行分段操作的数据处理装置和方法。 数据处理装置包括用于存储向量操作数的向量寄存器存储器和提供N个并行处理通道的向量处理电路,并且被布置为对由指定向量操作数提供的多达N个数据元素执行分段操作,每个数据元素被分配 到N条车道之一。 最多N个数据元素形成多个段,并且分段操作的执行包括对每个段的数据元素执行单独的操作,该单独操作涉及包含相关段的数据元素的通道之间的交互。 谓词生成电路响应于指定包括多个段描述符的输入向量操作数的计算描述符指令,以在执行分割操作时生成由向量处理电路使用的每通道谓词信息,以维持多个段中的每个段之间的边界 。 结果,阻止了包含来自不同段的数据元素的通道之间的相互作用。 这允许在矢量处理电路内非常有效地利用并行处理的通道。

    DEBUGGING NON-DETERMINISTIC EMBEDDED SYSTEMS
    17.
    发明申请
    DEBUGGING NON-DETERMINISTIC EMBEDDED SYSTEMS 审中-公开
    调查非决定性嵌入式系统

    公开(公告)号:WO2015061022A8

    公开(公告)日:2015-07-16

    申请号:PCT/US2014058999

    申请日:2014-10-03

    Abstract: An embedded device includes a processor executing instructions from module(s) in a code memory. The instructions specify: reading data from two non-deterministic registers (NDRs) of different types, compressing the data using respective, different compression algorithms, and storing the compressed data in a nonvolatile medium. A method of enabling debug tracing in a computer program product (CPP) includes locating instructions in the CPP that read NDRs, determining types of the NDRs, and adding instruction(s) to the CPP to compress the values read using compression algorithms corresponding to the respective NDR types. An emulator in a computer-readable medium receives emulation-target instructions (ETIs) and compressed NDR data, and emulates an execution sequence of the ETIs by determining NDR-reading instructions, determining a type of the NDR read by each, decompressing a portion of the NDR data using a type-specific decompressor, and updating emulated-machine state based on the decompressed portion.

    Abstract translation: 嵌入式设备包括执行来自代码存储器中的模块的指令的处理器。 该指令指定:从两种不同类型的非确定性寄存器(NDR)读取数据,使用相应的不同压缩算法压缩数据,并将压缩数据存储在非易失性介质中。 一种启用计算机程序产品(CPP)中的调试跟踪的方法包括定位读取NDR的CPP中的指令,确定NDR的类型,以及向CPP添加指令以使用对应于 各自的NDR类型。 计算机可读介质中的仿真器接收仿真目标指令(ETI)和压缩的NDR数据,并且通过确定NDR读取指令来模拟ETI的执行顺序,确定由每个读取指令读取的NDR的类型,解压缩一部分 使用类型特定解压缩器的NDR数据,以及基于解压缩部分更新仿真机状态。

    METHODS, APPARATUS, INSTRUCTIONS AND LOGIC TO PROVIDE VECTOR SUB-BYTE DECOMPRESSION FUNCTIONALITY
    18.
    发明申请
    METHODS, APPARATUS, INSTRUCTIONS AND LOGIC TO PROVIDE VECTOR SUB-BYTE DECOMPRESSION FUNCTIONALITY 审中-公开
    方法,装置,说明和逻辑提供矢量子字节分解功能

    公开(公告)号:WO2015017870A1

    公开(公告)日:2015-02-05

    申请号:PCT/US2014/055540

    申请日:2014-09-13

    Abstract: Methods, apparatus, instructions and logic provide SIMD vector sub-byte decompression functionality. Embodiments include shuffling a first and second byte into the least significant portion of a first vector element, and a third and fourth byte into the most significant portion. Processing continues shuffling a fifth and sixth byte into the least significant portion of a second vector element, and a seventh and eighth byte into the most significant portion. Then by shifting the first vector element by a first shift count and the second vector element by a second shift count, sub-byte elements are aligned to the least significant bits of their respective bytes. Processors then shuffle a byte from each of the shifted vector elements' least significant portions into byte positions of a destination vector element, and from each of the shifted vector elements' most significant portions into byte positions of another destination vector element.

    Abstract translation: 方法,装置,指令和逻辑提供SIMD矢量子字节解压缩功能。 实施例包括将第一和第二字节混洗到第一向量元素的最低有效部分中,以及将第三和第四字节混入最重要部分。 处理继续将第五和第六字节洗牌到第二向量元素的最低有效部分,并将第七和第八字节重新排列到最高有效部分。 然后,通过将第一移位计数和第二向量元素移位第二移位计数,将子字节元素与它们各自的字节的最低有效位对齐。 然后,处理器将来自移位向量元素的最小有效部分的每一个的字节从目的地向量元素的字节位置以及从每个移位向量元素的最高有效部分转移到另一目的地向量元素的字节位置。

Patent Agency Ranking