SCHEDULING HETEROGENEOUS COMPUTATION ON MULTITHREADED PROCESSORS

    公开(公告)号:US20250004847A1

    公开(公告)日:2025-01-02

    申请号:US18883768

    申请日:2024-09-12

    Abstract: Aspects include computation systems that can identify computation instances that are not capable of being reentrant, or are not reentrant capable on a target architecture, or are non-reentrant as a result of having a memory conflict in a particular execution situation. For example, a system can have a plurality of computation units, each with an independently schedulable SIMD vector. Computation instances can be defined by a program module, and a data element(s) that may be stored in a local cache for a particular computation unit of the plurality. Each local cache does not maintain coherency controls for such data elements. During scheduling, a scheduler can maintain a list of running (or runnable) instances, and attempt to schedule new computation instances by determining whether any new computation instance conflicts with a running instance and responsively defer scheduling. Such memory conflict checks can be conditioned on a flag or other indication of the potential for non-reentrancy.

    TRIANGLE PAIRING OPTIMIZER
    13.
    发明申请

    公开(公告)号:US20240428503A1

    公开(公告)日:2024-12-26

    申请号:US18668667

    申请日:2024-05-20

    Abstract: A method for grouping primitives into pairs of adjoining triangles for use in a ray tracing process. An input list of edges of triangular primitives is obtained, an edge bounding volume surface area (BVSA) and an additional edge qualifier is determined for each of the edges. The entries in the input list are sorted by edge BVSA then by edge qualifier, giving a sorted list in which the entries have a sorted order. The list is traversed in the sorted order to seek groups of matched edges within a predetermined window of list entries, each edge in a matched group having a matching edge BVSA and edge qualifier with another edge in the matched group from a different triangular primitive. When a group of matched edges is found, associated triangular primitives are designated as a cluster of adjoining primitives. The cluster of adjoining primitives are processed together as a group.

    PROCESSOR WITH HARDWARE PIPELINE
    14.
    发明申请

    公开(公告)号:US20240427632A1

    公开(公告)日:2024-12-26

    申请号:US18827456

    申请日:2024-09-06

    Abstract: A processor has a register bank to which software writes descriptors specifying tasks to be processed by a hardware pipeline. The register bank includes a plurality of register sets, each for holding the descriptor of a task. The processor includes a first selector operable to connect the execution logic to a selected one of the register sets and thereby enable the software to write successive ones of said descriptors to different ones of said register sets. The processor also includes a second selector operable to connect the hardware pipeline to a selected one of the register sets. The processor further comprises control circuitry configured to control the hardware pipeline to begin processing a current task based on the descriptor in a current one of the register sets while the software is writing the descriptor of another task to another of the register sets.

    Hierarchical mantissa bit length selection for hardware implementation of deep neural network

    公开(公告)号:US12175349B2

    公开(公告)日:2024-12-24

    申请号:US16180250

    申请日:2018-11-05

    Abstract: Hierarchical methods for selecting fixed point number formats with reduced mantissa bit lengths for representing values input to, and/or output, from, the layers of a DNN. The methods begin with one or more initial fixed point number formats for each layer. The layers are divided into subsets of layers and the mantissa bit lengths of the fixed point number formats are iteratively reduced from the initial fixed point number formats on a per subset basis. If a reduction causes the output error of the DNN to exceed an error threshold, then the reduction is discarded, and no more reductions are made to the layers of the subset. Otherwise a further reduction is made to the fixed point number formats for the layers in that subset. Once no further reductions can be made to any of the subsets the method is repeated for continually increasing numbers of subsets until a predetermined number of layers per subset is achieved.

    Methods and systems for implementing a convolution transpose layer of a neural network

    公开(公告)号:US12174910B2

    公开(公告)日:2024-12-24

    申请号:US18425726

    申请日:2024-01-29

    Abstract: Methods and systems for performing a convolution transpose operation between an input tensor having a plurality of input elements and a filter comprising a plurality of filter weights. The method includes: dividing the filter into a plurality of sub-filters; performing, using hardware logic, a convolution operation between the input tensor and each of the plurality of sub-filters to generate a plurality of sub-output tensors, each sub-output tensor comprising a plurality of output elements; and interleaving, using hardware logic, the output elements of the plurality of sub-output tensors to form a final output tensor for the convolution transpose.

    HISTOGRAM-BASED PER-LAYER DATA FORMAT SELECTION FOR HARDWARE IMPLEMENTATION OF DEEP NEURAL NETWORK

    公开(公告)号:US20240394525A1

    公开(公告)日:2024-11-28

    申请号:US18794854

    申请日:2024-08-05

    Abstract: A histogram-based method of selecting a fixed point number format for representing a set of values input to, or output from, a layer of a Deep Neural Network (DNN). The method comprises obtaining a histogram that represents an expected distribution of the set of values of the layer, each bin of the histogram is associated with a frequency value and a representative value in a floating point number format; quantising the representative values according to each of a plurality of potential fixed point number formats; estimating, for each of the plurality of potential fixed point number formats, the total quantisation error based on the frequency values of the histogram and a distance value for each bin that is based on the quantisation of the representative value for that bin; and selecting the fixed point number format associated with the smallest estimated total quantisation error as the optimum fixed point number format for representing the set of values of the layer.

    Method and system for wirelessly transmitting data

    公开(公告)号:US12156135B2

    公开(公告)日:2024-11-26

    申请号:US18241972

    申请日:2023-09-04

    Inventor: Ian R. Knowles

    Abstract: Methods and systems for wirelessly transmitting data between Wi-Fi stations without requiring the Wi-Fi stations to be fully connected to the Wi-Fi network. A first Wi-Fi station generates the data to be transmitted. The data comprises status data and/or wake-up data. The first Wi-Fi station then inserts the data in a vendor-specific information element of a probe request frame and wirelessly transmits the probe request frame. The probe request frame is then received by a second Wi-Fi station. If the probe request frame contains wake-up data and the second Wi-Fi station is operating in a low-power mode when it receives the probe request frame, the second Wi-Fi station will wake-up from the low-power mode. If the probe request frame contains status data then the second Wi-Fi station may process the probe request frame and/or forward at least a portion of the received probe request frame to another device.

    Decoding images compressed using MIP map compression

    公开(公告)号:US12155845B2

    公开(公告)日:2024-11-26

    申请号:US18389002

    申请日:2023-11-13

    Inventor: Rostam King

    Abstract: Methods and apparatus for compressing image data are described along with corresponding methods and apparatus for decompressing the compressed image data. A decoder unit samples compressed image data including interleaved blocks of data encoding a first image and blocks of data encoding differences between the first image and a second image, the second image being twice the width and the height of the first image. A difference decoder decodes a fetched encoded sub-block of the differences between the first and second images and output a difference quad and a prediction value for a pixel, and a filter sub-unit generates a reconstruction of the image at a sample position using decoded blocks of the first image, the difference quad and the prediction value.

    LEARNED IMAGE TRANSFORMATION METHODS AND SYSTEMS IN GRAPHICS RENDERING

    公开(公告)号:US20240378791A1

    公开(公告)日:2024-11-14

    申请号:US18631382

    申请日:2024-04-10

    Abstract: Transforming rendered frames in a graphics processing system to obtain enhanced frames with desired characteristics of a set of target images includes selecting a plurality of shaders, each defined by a parametrized mathematical function arranged to replicate a particular visual characteristic. For each shader, parameters of the parametrized mathematical function have been derived in dependence on a set of target images so that the shader is arranged to impose its respective particular visual characteristic in dependence on an extent to which the particular visual characteristic is exhibited in the target images. The plurality of shaders are combined to form a pipeline, obtaining one or more rendered frames, applying the pipeline to at least a portion of the one or more rendered frames to obtain enhanced frames, and outputting for display the enhanced frames, wherein the enhanced frames exhibit visual characteristics of the target images.

Patent Agency Ranking