MASKING ROW OR COLUMN POSITIONS FOR MATRIX PROCESSING

    公开(公告)号:US20230214236A1

    公开(公告)日:2023-07-06

    申请号:US17998221

    申请日:2021-05-13

    Applicant: Arm Limited

    CPC classification number: G06F9/448 G06F17/16

    Abstract: An apparatus comprises matrix processing circuitry to perform a matrix processing operation on first and second input operands to generate a result matrix, where the result matrix is a two-dimensional matrix; operand storage circuitry to store information for forming the first and second input operands for the matrix processing circuitry; and masking circuitry to perform a masking operation to mask at least part of the matrix processing operation or the information stored to the operand storage circuitry based on masking state data indicative of one or more masked row or column positions to be treated as representing a masking value. This is useful for improving performance of two-dimensional convolution operations, as the masking can be used to mask out selected rows or columns when performing the 2D convolution as a series of 1×1 convolution operations applied to different kernel positions.

    DATA STRUCTURE PROCESSING
    2.
    发明申请

    公开(公告)号:US20210042115A1

    公开(公告)日:2021-02-11

    申请号:US16531208

    申请日:2019-08-05

    Applicant: Arm Limited

    Abstract: An apparatus comprises: an instruction decoder and processing circuitry. In response to a data structure processing instruction specifying at least one input data structure identifier and an output data structure identifier, the instruction decoder controls the processing circuitry to perform a processing operation on at least one input data structure to generate an output data structure. Each input/output data structure comprises an arrangement of data corresponding to a plurality of memory addresses. The apparatus comprises two or more sets of one or more data structure metadata registers, each set associated with a corresponding data structure identifier and designated to store address-indicating metadata for identifying the memory addresses for the data structure identified by the corresponding data structure identifier.

    A DATA PROCESSING APPARATUS AND METHOD FOR PERFORMING LOCK-PROTECTED PROCESSING OPERATIONS FOR MULTIPLE THREADS

    公开(公告)号:US20170139757A1

    公开(公告)日:2017-05-18

    申请号:US15322882

    申请日:2015-05-19

    Applicant: ARM LIMITED

    Abstract: A data processing apparatus and method are provided for executing a plurality of threads. Processing circuitry performs processing operations required by the plurality of threads, the processing operations including a lock-protected processing operation with which a lock is associated, where the lock needs to be acquired before the processing circuitry performs the lock-protected processing operation. Baton maintenance circuitry is used to maintain a baton in association with the plurality of threads, the baton forming a proxy for the lock, and the baton maintenance circuitry being configured to allocate the baton between the threads. Via communication between the processing circuitry and the baton maintenance circuitry, once the lock has been acquired for one of the threads, the processing circuitry performs the lock-protected processing operation for multiple threads before the lock is released, with the baton maintenance circuitry identifying a current thread amongst the multiple threads for which the lock-protected processing operation is to be performed by allocating the baton to that current thread. The baton can hence be passed from one thread to the next, without needing to release and re-acquire the lock. This provides a significant performance improvement when performing lock-protected processing operations across multiple threads.

    VECTOR PROCESSING
    4.
    发明公开
    VECTOR PROCESSING 审中-公开

    公开(公告)号:US20230273792A1

    公开(公告)日:2023-08-31

    申请号:US18006813

    申请日:2021-07-08

    Applicant: ARM LIMITED

    CPC classification number: G06F9/3016 G06F9/30036

    Abstract: Instruction decoder to decode processing instructions; one or more first registers; first processing circuitry to execute the decoded processing instructions in a first processing mode and configured to execute the decoded processing instructions using the one or more first registers; and control circuitry to execute the decoded processing instructions in a second processing mode using one or more second registers; the instruction decoder being configured to decode processing instructions selected from a first instruction set and a second instruction set in the second processing mode, in which one or both of the first and second instruction sets comprises at least one unique instruction set; the instruction decoder configured to decode one or more mode change instructions to change between the first and second processing mode; and the first processing circuitry configured to change the current processing mode between the first and second processing mode responding to executing mode change instruction.

    DATA STRUCTURE RELINQUISHING
    5.
    发明申请

    公开(公告)号:US20210042114A1

    公开(公告)日:2021-02-11

    申请号:US16531206

    申请日:2019-08-05

    Applicant: Arm Limited

    Abstract: A data processing apparatus is provided comprising: a plurality of storage circuits to store data. Execution circuitry performs one or more operations using the storage circuits in response to instructions. The instructions include a relinquish instruction. The execution circuitry responds to the relinquish instruction by indicating that at least one of the plurality of storage circuits is an unused storage circuit and the execution circuitry affects execution of future instructions based on the unused storage circuit after executing the relinquish instruction.

    SHARED RESOURCES IN A DATA PROCESSING APPARATUS FOR EXECUTING A PLURALITY OF THREADS

    公开(公告)号:US20170286107A1

    公开(公告)日:2017-10-05

    申请号:US15505714

    申请日:2015-07-28

    Applicant: ARM LIMITED

    Abstract: A data processing apparatus (100) executes threads and includes a general program counter (PC) (120) identifying an instruction to be executed for at least a subset of the threads. Each thread has a thread PC (184). The subset of threads has at least one lock parameter (188, 500-504) for tracking exclusive access to shared resources. In response to a first instruction executed for a thread, the processor (160) modifies the at least one lock parameter (188), (500-504) to indicate that the thread has gained exclusive access to the shared resource. In response to a second instruction, the processor modifies the at least one lock parameter (188, 500-504) to indicate that the thread no longer has exclusive access. A selector (110) selects one of the subset of threads based on the at least one lock parameter (188, 500-504) and sets the general PC (120) to the thread PC (184) of the selected thread.

    REGISTER-BASED MATRIX MULTIPLICATION

    公开(公告)号:US20220291923A1

    公开(公告)日:2022-09-15

    申请号:US17678221

    申请日:2022-02-23

    Applicant: Arm Limited

    Abstract: Techniques for performing matrix multiplication in a data processing apparatus are disclosed, comprising apparatuses, matrix multiply instructions, methods of operating the apparatuses, and virtual machine implementations. Registers, each register for storing at least four data elements, are referenced by a matrix multiply instruction and in response to the matrix multiply instruction a matrix multiply operation is carried out. First and second matrices of data elements are extracted from first and second source registers, and plural dot product operations, acting on respective rows of the first matrix and respective columns of the second matrix are performed to generate a square matrix of result data elements, which is applied to a destination register. A higher computation density for a given number of register operands is achieved with respect to vector-by-element techniques.

    PREFETCH STRATEGY CONTROL
    8.
    发明申请
    PREFETCH STRATEGY CONTROL 审中-公开
    预选策略控制

    公开(公告)号:US20150121038A1

    公开(公告)日:2015-04-30

    申请号:US14061837

    申请日:2013-10-24

    Applicant: ARM LIMITED

    CPC classification number: G06F9/3455 G06F9/383 G06F9/3851 G06F9/3887

    Abstract: A single instruction multiple thread (SIMT) processor 2 includes execution circuitry 6, prefetch circuitry 12 and prefetch strategy selection circuitry 14. The prefetch strategy selection circuitry serves to detect one or more characteristics of a stream of program instructions that are being executed to identify whether or not a given data access instruction within a program will be executed a plurality of times. The prefetch strategy to use is selected from a plurality of selectable prefetch strategy in dependence upon the detection of such characteristics.

    Abstract translation: 单指令多线程(SIMT)处理器2包括执行电路6,预取电路12和预取策略选择电路14.预取策略选择电路用于检测正在执行的程序指令流的一个或多个特性,以识别是否 或者不在程序内的给定的数据访问指令将被执行多次。 根据这种特征的检测,从多个可选择的预取策略中选择要使用的预取策略。

    SCHEDULING PROGRAM INSTRUCTIONS WITH A RUNNER-UP EXECUTION POSITION
    9.
    发明申请
    SCHEDULING PROGRAM INSTRUCTIONS WITH A RUNNER-UP EXECUTION POSITION 有权
    安排执行程序指令执行执行位置

    公开(公告)号:US20150100768A1

    公开(公告)日:2015-04-09

    申请号:US14048141

    申请日:2013-10-08

    Applicant: ARM LIMITED

    Abstract: A single instruction multiple thread (SIMT) processor 2 includes scheduling circuitry 8 for calculating a next scheduled execution point for execution circuits 4 which execute respective threads corresponding to a common program. In addition to calculating the next scheduled execution point, the scheduling circuitry determines a runner up execution point which would have been determined as the next scheduled execution point if the threads which actually correspond to the next scheduled execution point had been removed from consideration. This runner up execution point is used to identify points of re-convergence within the program flow and as part of the operation of a static branch predictor 10.

    Abstract translation: 单指令多线程(SIMT)处理器2包括调度电路8,用于计算执行对应于公共程序的各个线程的执行电路4的下一个调度执行点。 除了计算下一个调度的执行点之外,如果实际对应于下一个调度的执行点的线程已经被考虑,则调度电路确定将被确定为下一个调度的执行点的赛跑者执行点。 该次要执行点用于识别程序流程内的再收敛点,并用作静态分支预测器10的操作的一部分。

    APPARATUS AND METHOD FOR MAPPING ARCHITECTURAL REGISTERS TO PHYSICAL REGISTERS
    10.
    发明申请
    APPARATUS AND METHOD FOR MAPPING ARCHITECTURAL REGISTERS TO PHYSICAL REGISTERS 审中-公开
    将建筑物寄存器映射到物理寄存器的装置和方法

    公开(公告)号:US20140164742A1

    公开(公告)日:2014-06-12

    申请号:US13927552

    申请日:2013-06-26

    Applicant: ARM Limited

    Abstract: An apparatus and method are provided for performing register renaming. Available register identifying circuitry is provided to identify which physical registers form a pool of physical registers available to be mapped by register renaming circuitry to an architectural register specified by an instruction to be executed. Configuration data whose value is modified during operation of the processing circuitry is stored such that, when the configuration data has a first value, the configuration data identifies at least one architectural register of the architectural register set which does not require mapping to a physical register by the register renaming circuitry. The register identifying circuitry is arranged to reference the modified data value, such that when the configuration data has the first value, the number of physical registers in the pool is increased due to the reduction in the number of architectural registers which require mapping to physical registers.

    Abstract translation: 提供了一种用于执行寄存器重命名的装置和方法。 提供可用的寄存器识别电路以识别哪些物理寄存器形成可由寄存器重命名电路映射到由要执行的指令指定的架构寄存器的物理寄存器池。 存储其值在处理电路的操作期间被修改的配置数据,使得当配置数据具有第一值时,配置数据识别架构寄存器集合的至少一个体系结构寄存器,其不需要映射到物理寄存器 寄存器重命名电路。 寄存器识别电路被布置为引用修改的数据值,使得当配置数据具有第一值时,由于需要映射到物理寄存器的架构寄存器的数量的减少,池中的物理寄存器的数量增加 。

Patent Agency Ranking