RUNTIME CONFIGURABLE REGISTER FILES FOR ARTIFICIAL INTELLIGENCE WORKLOADS

    公开(公告)号:WO2023091258A1

    公开(公告)日:2023-05-25

    申请号:PCT/US2022/046732

    申请日:2022-10-14

    Abstract: There is disclosed a system and method of performing an artificial intelligence (AI) inference, including: programming an AI accelerator circuit to solve an AI problem with a plurality of layer-specific register file (RF) size allocations, wherein the AI accelerator circuit comprises processing elements (PEs) with respective associated RFs, wherein the RFs individually are divided into K sub-banks of size B bytes, wherein B and K are integers, and wherein the RFs include circuitry to individually allocate a sub-bank to one of input feature (IF), output feature (OF), or filter weight (FL), and wherein programming the plurality of layer-specific RF size allocations comprises accounting for sparse data within the layer; and causing the AI accelerator circuit to execute the AI problem, including applying the layer-specific RF size allocations at run-time.

    LOCK-FREE DATA AGGREGATION ON DISTRIBUTED SYSTEMS

    公开(公告)号:WO2023087975A1

    公开(公告)日:2023-05-25

    申请号:PCT/CN2022/124455

    申请日:2022-10-10

    Abstract: Aspects of the invention include distributed systems and methods that provide data aggregation in a fast, lock-free manner that satisfies four tenets of data aggregation simultaneously: bounded staleness, monotonic reads, operational independence, and low read/write overhead. A non-limiting example computer-implemented method includes producing new data at a first producer node of a distributed system having two or more producer nodes, a consumer node, and a global view. The global view includes a hierarchical binary tree of cell addresses. Each producer node includes a local view of the global view. Responsive to producing the new data, the local view of the first producer node is updated with the new data and a local timestamp and, if the local timestamp is greater than a global timestamp, a pointer in a cell address of the global view is flipped to point to the new data.

    TECHNIQUE FOR HANDLING SEALED CAPABILITIES
    5.
    发明申请

    公开(公告)号:WO2023067295A1

    公开(公告)日:2023-04-27

    申请号:PCT/GB2022/052321

    申请日:2022-09-14

    Applicant: ARM LIMITED

    Abstract: An apparatus and method are described for handling sealed capabilities. The apparatus has processing circuitry to perform processing operations during which access requests to memory are generated, wherein the processing circuitry is arranged to generate memory addresses for the access requests using capabilities that identify constraining information. Checking circuitry then determines whether a given access request whose memory address is generated using a given capability is permitted based on the constraining information identified by that given capability, and based on a level of trust associated with the given access request. Each capability has a capability level of trust associated therewith, and the level of trust associated with the given access request is dependent on both a current mode level of trust associated with a current mode of operation of the processing circuitry, and the capability level of trust of the given capability. At least one of the capabilities is settable as a sealed capability, and the apparatus further comprises sealed capability handling circuitry to prevent the processing circuitry performing at least one processing operation using a given sealed capability when the current mode level of trust is a lower level of trust than the capability level of trust of the given sealed capability.

    LOAD LATENCY AMELIORATION USING BUNCH BUFFERS

    公开(公告)号:WO2023064230A1

    公开(公告)日:2023-04-20

    申请号:PCT/US2022/046210

    申请日:2022-10-11

    Applicant: ASCENIUM, INC.

    Inventor: FOLEY, Peter

    Abstract: Techniques for task processing based on load latency amelioration using bunch buffers are disclosed. A two-dimensional array of compute elements is accessed. Each compute element within the array of compute elements is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements. Control for the compute elements is provided on a cycle-by-cycle basis. The control is enabled by a stream of wide control words generated by the compiler. Sets of control word bits are loaded into buffers. Each buffer is associated with and coupled to a unique compute element within the array of compute elements. The sets of control word bits provide operational control for the compute element with which it is associated. Operations are executed within the array of elements. The operations are based on a selected set of control word bits which comprise a control word bunch.

    一种向量读写方法、向量寄存器系统、设备及介质

    公开(公告)号:WO2023056743A1

    公开(公告)日:2023-04-13

    申请号:PCT/CN2022/089897

    申请日:2022-04-28

    Abstract: 本申请公开了一种向量读写方法、向量寄存器系统、设备及介质,在获取到向量写指令时,通过向量寄存器控制器将待写入向量地址空间转换为待写入向量寄存器文件位地址,并针对非标准向量通过非标准向量转换单元转换为待写入非标准向量后执行写入,实现存入任意格式的向量数据。在获取到向量读指令时,通过向量寄存器控制器根据待读取宽度和待读取长度,将待读取向量地址空间转换为待读取向量寄存器文件位地址后进行读取,实现读取任意格式的向量数据。由此利用向量寄存器系统同时进行向量读取和向量写入的操作,并且可以输出任意格式的向量数据,同时可以存入任意格式的向量数据,从而能够支持更多非标准向量的向量预算。

    METHODS, APPARATUS, AND ARTICLES OF MANUFACTURE TO INCREASE DATA REUSE FOR MULTIPLY AND ACCUMULATE (MAC) OPERATIONS

    公开(公告)号:WO2023048827A1

    公开(公告)日:2023-03-30

    申请号:PCT/US2022/038910

    申请日:2022-07-29

    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed that increase data reuse for multiply and accumulate (MAC) operations. An example apparatus includes a MAC circuit to process a first context of a set of a first type of contexts stored in a first buffer and a first context of a set of a second type of contexts stored in a second buffer. The example apparatus also includes control logic circuitry to, in response to determining that there is an additional context of the second type to be processed in the set of the second type of contexts, maintain the first context of the first type in the first buffer. The control logic circuitry is also to, in response to determining that there is an additional context of the first type to be processed in the set of the first type of contexts maintain the first context of the second type in the second buffer and iterate a pointer of the second buffer from a first position to a next position in the second buffer.

    APPARATUS AND METHOD OF A SCENARIO-BASED PERMISSION MECHANISM FOR ACCESS TO A RESTRICTED RESOURCE

    公开(公告)号:WO2023048733A1

    公开(公告)日:2023-03-30

    申请号:PCT/US2021/052251

    申请日:2021-09-27

    Abstract: According to one aspect of the present disclosure, a client device is provided. The client device may include a scenario monitor. The scenario monitor may be configured to maintain a scenario policy associated with access to a restricted resource. The client device may also include an application client. The application client may receive a request for access to the restricted resource. The application client may perform a self-permission check associated with the request. In response to the self-permission check indicating that access to the restricted resource is permitted, the scenario monitor may perform a first scenario policy check associated with the request based on the scenario policy. The client device may further include an activity manager service. In response to the first scenario policy check indicating that access to the restricted resource is permitted, the activity manager service may perform a component permission check associated with the restricted resource.

    偏移预取方法、执行偏移预取的装置、计算设备和介质

    公开(公告)号:WO2023035654A1

    公开(公告)日:2023-03-16

    申请号:PCT/CN2022/093310

    申请日:2022-05-17

    Inventor: 胡世文

    Abstract: 本公开提供了一种偏移预取方法、执行偏移预取的装置、计算设备和介质。该偏移预取方法包括:利用偏移预取器从预置偏移值表格中选择用于生成预取请求的K个偏移预取值,其中,预置偏移值表格包括预先设置的N个偏移值,K个偏移预取值为偏移预取器从预置偏移值表格中在时间上最新选择的偏移预取值,其中,N和K为正整数,N大于K;记录K个偏移预取值,用于形成包括K个偏移预取值的近期偏移值表格;以及利用偏移预取器从近期偏移值表格中选择第一偏移预取值,用于基于第一偏移预取值进行数据预取。

Patent Agency Ranking