Abstract:
A master device has a buffer for storing data transferred from, or to be transferred to, a memory system. Control circuitry issues from time to time a group of one or more transactions to request transfer of a block of data between the memory system and the buffer. Hardware or software mechanism can be provided to detect at least one memory load parameter indicating how heavily loaded the memory system is, and a group size of the block of data transferred per group can be varied based on the memory load parameter. By adapting the size of the block of data transferred per group based on memory system load, a better balance between energy efficiency and quality of service can be achieved.
Abstract:
An interface apparatus and method of operating the same are provided. The interface apparatus receives an uncompressed image data read request using a first addressing scheme at a first bus interface and transmits a compressed image data read request using a second addressing scheme from a second bus interface. Address translation circuitry translates between the first addressing scheme and the second addressing scheme. Decoding circuitry decodes a set of compressed image data received via the second bus interface to generate the set of uncompressed image data which is then transmitted via the first bus interface. The use of a second addressing scheme and image data compression is thus transparent to the source of the uncompressed image data read request, and the interface apparatus can therefore be used to connect devices which use different addressing schemes and image data formats, without either needing to be modified.
Abstract:
There is provided a data processing apparatus for performing machine learning. The data processing apparatus includes convolution circuitry for convolving a plurality of neighbouring regions of input data using a kernel to produce convolution outputs. Max-pooling circuitry determines and selects the largest of the convolution outputs as a pooled output and prediction circuitry performs a size prediction of the convolution outputs based on the neighbouring regions, wherein the size prediction is performed prior to the max-pooling circuitry determining the largest of the convolution outputs and adjusts a behaviour of the convolution circuitry based on the size prediction.
Abstract:
A system-on-chip comprises processing circuitry to process input data to generate output data, and power management circuitry to control power management policy for at least a portion of the system-on-chip. The power management circuitry controls the power management policy depending on metadata indicative of a property of the input data to be processed by the processing circuitry.
Abstract:
A method for optimizing machine learning processing is provided. The method comprising retrieving, neural network architecture information for a neural network, the neural network architecture information comprising layer information and kernel information for the neural network. The network architecture information is analyzed to identify convolutional layers in the neural network which have associated strided layers. A first kernel for a convolutional layer identified as having an associated strided layer, and a second kernel for the strided layer associated with the convolutional layer are retrieved. A composite kernel is then generated, based on the first and second kernel, that performs the functions of the first and second kernel. Finally, the composite kernel is stored for further use by a neural network.
Abstract:
A data processing system including storage. The data processing system also includes at least one processor to generate output data using at least a portion of a first neural network layer and generate a key associated with at least the portion of the first neural network layer. The at least one processor is further operable to obtain the key from the storage and obtain a version of the output data for input into a second neural network layer. Using the key, the at least one processor is further operable to determine whether the version of the output data differs from the output data.
Abstract:
Apparatus and a corresponding method for processing image data are provided. The apparatus has compositing circuitry to generate a composite layer for a frame for display from image data representing plural layers of content within the frame. Plural latency buffers are provided to store at least a portion of the image data representing the plural layers. At least one of the plural latency buffers is larger than at least one other of the plural latency buffers. The compositing circuitry is responsive to at least one characteristic of the plural layers of content to allocate the plural layers to respective latency buffers of the plural latency buffers. Image data information for a layer allocated to the larger latency buffer is available for analysis earlier than that of the layers allocated to the smaller latency buffers and processing efficiencies can then result.