TASK EXECUTION IN A SIMD PROCESSING UNIT WITH PARALLEL GROUPS OF PROCESSING LANES

    公开(公告)号:US20250061536A1

    公开(公告)日:2025-02-20

    申请号:US18907801

    申请日:2024-10-07

    Abstract: A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.

    METHODS AND ALLOCATORS FOR ALLOCATING PORTIONS OF A STORAGE UNIT USING VIRTUAL PARTITIONING

    公开(公告)号:US20250060890A1

    公开(公告)日:2025-02-20

    申请号:US18936802

    申请日:2024-11-04

    Inventor: Ian King

    Abstract: Methods and storage unit allocators for allocating one or more portions of a storage unit to a plurality of tasks for storing at least two types of data. The method includes receiving a request for one or more portions of the storage unit to store a particular type of data of the at least two types of data for a task of the plurality of tasks; associating the request with one of a plurality of virtual partitionings of the storage unit based on one or more characteristics of the request, each virtual partitioning allotting none, one, or more than one portion of the storage unit to each of the at least two types of data; and allocating the requested one or more portions of the storage unit to the task from the none, one, or more than one portion of the storage unit allotted to the particular type of data in the virtual partitioning associated with the request.

    Multistage collector for outputs in multiprocessor systems

    公开(公告)号:US12217328B2

    公开(公告)日:2025-02-04

    申请号:US18219873

    申请日:2023-07-10

    Abstract: Aspects include a multistage collector to receive outputs from plural processing elements. Processing elements may comprise (each or collectively) a plurality of clusters, with one or more ALUs that may perform SIMD operations on a data vector and produce outputs according to the instruction stream being used to configure the ALU(s). The multistage collector includes substituent components each with at least one input queue, a memory, a packing unit, and an output queue; these components can be sized to process groups of input elements of a given size, and can have multiple input queues and a single output queue. Some components couple to receive outputs from the ALUs and others receive outputs from other components. Ultimately, the multistage collector can output groupings of input elements. Each grouping of elements (e.g., at input queues, or stored in the memories of component) can be formed based on matching of index elements.

    Image Data Decompression
    6.
    发明申请

    公开(公告)号:US20250037316A1

    公开(公告)日:2025-01-30

    申请号:US18913828

    申请日:2024-10-11

    Inventor: Xile Yang

    Abstract: A method of performing lossy compression on a block of image data in accordance with a multi-level difference table determines an origin value for the block of image data, and determines a level within the multi-level difference table for the block of image data, by determining a maximum difference between the determined origin value and any one of image element values in the block of image data and selecting from the multi-level difference table the level whose largest entry most closely represents the determined maximum difference. For each image element value in the block, one of the entries at the determined level within the multi-level difference table is selected, and a compressed block of data for the block of image data is formed, the compressed block of data including (i) data representing the determined origin value, (ii) an indication of the determined level, and (iii) for each image element value in the block of image data, an indication of the selected entry for that image element value.

    On demand geometry and acceleration structure creation with tile object lists

    公开(公告)号:US12211136B2

    公开(公告)日:2025-01-28

    申请号:US18102054

    申请日:2023-01-26

    Abstract: Systems and methods of geometry processing, for rasterization and ray tracing processes provide for pre-processing of source geometry, such as by tessellating or other procedural modification of source geometry, to produce final geometry on which a rendering will be based. An acceleration structure (or portion thereof) for use during ray tracing is defined based on the final geometry. Only coarse-grained elements of the acceleration structure may be produced or retained, and a fine-grained structure within a particular coarse-grained element may be Produced in response to a collection of rays being ready for traversal within the coarse grained element. Final geometry can be recreated in response to demand from a rasterization engine, and from ray intersection units that require such geometry for intersection testing with primitives. Geometry at different resolutions can be generated to respond to demands from different rendering components.

    Processing work items in processing logic

    公开(公告)号:US12204448B2

    公开(公告)日:2025-01-21

    申请号:US18083735

    申请日:2022-12-19

    Inventor: Tijmen Spreij

    Abstract: A plurality of work items are processed through a processing pipeline comprising a plurality of stages in processing logic. The processing of a work item includes: (i) reading data in accordance with a memory address associated with the work item, (ii) updating the read data, and (iii) writing the updated data in accordance with the memory address associated with the work item. The method includes processing a first work item and a second work item through the processing pipeline, wherein the processing of the first work item through the pipeline is initiated earlier than the processing of the second work item, and where it is determined that the first and second work items are associated with the same memory address, first updated data of the first work item is written to a register in the processing logic, and the processing of the second work item comprises reading the first updated data from the register instead of reading data from the memory.

    COMPRESSING A NEURAL NETWORK
    9.
    发明申请

    公开(公告)号:US20250021805A1

    公开(公告)日:2025-01-16

    申请号:US18680170

    申请日:2024-05-31

    Abstract: A computer implemented method of compressing a neural network, the method comprising: receiving a neural network comprising a plurality of layers; forming a graph that represents the flow of data through the plurality layers of the neural network, the graph comprising: a plurality of vertices, each vertex of the plurality of vertices being representative of an output channel of a layer of the plurality of layers of the neural network; and one or more edges, each edge of the one or more edges representing the potential flow of non-zero data between respective output channels represented by a respective pair of vertices; identifying, by traversing the graph, one or more redundant channels comprised by the plurality of layers of the neural network; and outputting a compressed neural network in which the identified one or more redundant channels are not present in the compressed neural network.

    Anisotropic texture filtering using adaptive filter kernel

    公开(公告)号:US12198230B2

    公开(公告)日:2025-01-14

    申请号:US17871082

    申请日:2022-07-22

    Inventor: Rostam King

    Abstract: A texture filtering unit applies anisotropic filtering using a filter kernel which can be adapted to apply different amounts of anisotropy up to a maximum amount of anisotropy. If it is determined that a received input amount of anisotropy is not above the maximum amount of anisotropy, the filter kernel applies the input amount of anisotropy, and texels of a texture are sampled using the filter kernel to determine a filtered texture value. If it is determined that the input amount of anisotropy is above the maximum amount of anisotropy, the filter kernel applies an amount of anisotropy that is not above the maximum amount of anisotropy, a plurality of sampling operations are performed to sample texels of the texture using the filter kernel to determine a respective plurality of intermediate filtered texture values, and the plurality of intermediate filtered texture values are combined to determine a filtered texture value which has been filtered in accordance with the input amount of anisotropy and the input direction of anisotropy.

Patent Agency Ranking