Abstract:
Systems and methods relate to performing address translations in a multithreaded memory management unit (MMU). Two or more address translation requests can be received by the multithreaded MMU and processed in parallel to retrieve address translations to addresses of a system memory. If the address translations are present in a translation cache of the multithreaded MMU, the address translations can be received from the translation cache and scheduled for access of the system memory using the translated addresses. If there is a miss in the translation cache, two or more address translation requests can be scheduled in two or more translation table walks in parallel.
Abstract:
An exemplary method for intelligent compression defines a threshold value for a key performance indicator. Based on the key performance indicator value, data blocks generated by a producer component may be scaled down to reduce power and/or bandwidth consumption when being compressed according to a lossless compression module. The compressed data blocks are then stored in a memory component along with metadata that signals the scaling factor used prior to compression. Consumer components later retrieving the compressed data blocks from the memory component may decompress the data blocks and upscale, if required, based on the scaling factor signaled by the metadata.
Abstract:
Systems, methods, and computer program products are disclosed for reducing latency in a system that includes one or more processing devices, a system memory, and a cache memory. A pre-fetch command that identifies requested data is received from a requestor device. The requested data is pre-fetched from the system memory into the cache memory in response to the pre-fetch command. The data pre-fetch may be preceded by a pre-fetch of an address translation. A data access request corresponding to the pre-fetch command is then received, and in response to the data access request the data is provided from the cache memory to the requestor device.
Abstract:
Certain aspects of the present disclosure provide techniques for concurrently performing inferences using a machine learning model and optimizing parameters used in executing the machine learning model. An example method generally includes receiving a request to perform inferences on a data set using the machine learning model and performance metric targets for performance of the inferences. At least a first inference is performed on the data set using the machine learning model to meet a latency specified for generation of the first inference from receipt of the request. While performing the at least the first inference, operational parameters resulting in inference performance approaching the performance metric targets are identified based on the machine learning model and operational properties of the computing device. The identified operational parameters are applied to performance of subsequent inferences using the machine learning model.
Abstract:
Various embodiments include methods and devices for implementing compression of high dynamic ratio fields. Various embodiments may include receiving a compression block having data units, receiving a mapping for the compression block, wherein the mapping is configured to map bits of each data unit to two or more data fields to generate a first set of data fields and a second set of data fields, compressing the first set of data fields together to generate a compressed first set of data fields, and compressing the second set of data fields together to generate a compressed second set of data fields.
Abstract:
An exemplary method for intelligent compression uses a foveated-compression approach. First, the location of a fixation point within an image frame is determined. Next, the image frame is sectored into two or more sectors such that one of the two or more sectors is designated as a fixation sector and the remaining sectors are designated as foveation sectors. A sector may be defined by one or more tiles within the image frame. The fixation sector includes the particular tile that contains the fixation point and is compressed according to a lossless compression algorithm. The foveation sectors are compressed according to lossy compression algorithms. As the locations of foveation sectors increase in angular distance from the location of the fixation sector, a compression factor may be increased.
Abstract:
Techniques are described in which a device is configured to retrieve a metadata buffer for rendering a sub-frame of a set of sub-frames for a frame. A data block of a data buffer is configured to store image data for rendering the sub-frame. In response to determining, based on the metadata buffer for rendering the sub-frame, that the sub-frame includes a color pattern, fixed color value, or combination thereof, the device refrains from retrieving the image data from the data block of the data buffer and determines the image data for rendering the sub-frame based on the metadata buffer.
Abstract:
A method and system for dynamic control of shared memory resources within a portable computing device (“PCD”) are disclosed. A limit request of an unacceptable deadline miss (“UDM”) engine of the portable computing device may be determined with a limit request sensor within the UDM element. Next, a memory management unit modifies a shared memory resource arbitration policy in view of the limit request. By modifying the shared memory resource arbitration policy, the memory management unit may smartly allocate resources to service translation requests separately queued based on having emanated from either a flooding engine or a non-flooding engine.
Abstract:
Techniques are described in which a device is configured to retrieve a metadata buffer for rendering a sub-frame of a set of sub-frames for a frame. A data block of a data buffer is configured to store image data for rendering the sub-frame. In response to determining, based on the metadata buffer for rendering the sub-frame, that the sub-frame includes a color pattern, fixed color value, or combination thereof, the device refrains from retrieving the image data from the data block of the data buffer and determines the image data for rendering the sub-frame based on the metadata buffer.