Abstract:
In some aspects, the present disclosure provides a method for managing a command queue in a universal flash storage (UFS) host device. The method includes determining to power on a first subsystem of a system-on-a-chip (SoC), wherein the determination to power on the first subsystem is made by a second subsystem of the SoC based on detection of user identity data contained in a first image frame during an initial biometric detection process. In certain aspects, the second subsystem is configured to operate independent of the first subsystem and control power to the first subsystem. In certain aspects, the second subsystem includes a second optical sensor, a set of ambient sensors, and a second processor configured to detect, via a set of ambient sensors, an event comprising one or more of an environmental event outside of the device or a motion event of the device.
Abstract:
Aspects include computing devices and methods implemented by the computing for automatic cache coherency for page table data on a computing device. Some aspects may include modifying, by a first processing device, page table data stored in a first cache associated with the first processing device, receiving, at a page table coherency unit, a page table cache invalidate signal from the first processing device, issuing, by the page table coherency unit, a cache maintenance operation command to the first processing device, and writing, by the first processing device, the modified page table data stored in the first cache to a shared memory accessible by the first processing device and a second processing device associated with a second cache storing the page table data.
Abstract:
This disclosure relates to allocating memory resources of a computing device comprising non-volatile random access memory (NVRAM) and dynamic random access memory (DRAM). An exemplary method is performed for every independently executable component of an application and includes determining attributes of the component. The method also includes associating the component with a memory profile of a plurality of memory profiles based on the attributes, wherein each memory profile of the plurality of memory profiles specifies a number of banks of the NVRAM and a number of banks of the DRAM. The method also includes causing the computing device to generate an assignment of the component to banks of the NVRAM and DRAM based on the memory profile associated with the component so the computing device can execute the component using the banks of the NVRAM and DRAM based on the assignment.
Abstract:
Various aspects include methods and computing devices implementing methods for reverse tiling of work items. Various aspects may include receiving information relating to a kernel execution, receiving information relating to a work item created for a kernel execution, and applying a reverse tiling function to produce a reverse tiling work item identifier (ID) for the work item to implement a pattern of access of the memory device resources. In various aspects, the reverse tiling function may be a static preprogrammed reverse tiling function, a a dynamically generated reverse tiling function, or a reverse tiling function selected from a plurality of reverse tiling functions. In various aspects, applying the reverse tiling function to produce the reverse tiling work item identifier for the work item may occur in response to determining that the pattern of access of a memory device resources provides a benefit over a default pattern of access.
Abstract:
Methods and systems for pre-fetching address translations in a memory management unit (MMU) of a device are disclosed. In an embodiment, the MMU receives a pre-fetch command from an upstream component of the device, the pre-fetch command including an address of an instruction, pre-fetches a translation of the instruction from a translation table in a memory of the device, and stores the translation of the instruction in a translation cache associated with the MMU.
Abstract:
Providing memory management unit (MMU) partitioned translation caches, and related apparatuses, methods, and computer-readable media. In this regard, an apparatus comprising an MMU is provided. The MMU comprises a translation cache providing a plurality of translation cache entries defining address translation mappings. The MMU further comprises a partition descriptor table providing a plurality of partition descriptors defining a corresponding plurality of partitions each comprising one or more translation cache entries of the plurality of translation cache entries. The MMU also comprises a partition translation circuit configured to receive a memory access request from a requestor. The partition translation circuit is further configured to determine a translation cache partition identifier (TCPID) of the memory access request, identify one or more partitions of the plurality of partitions based on the TCPID, and perform the memory access request on a translation cache entry of the one or more partitions.
Abstract:
Various embodiments of methods and systems for cache-level memory management in a system on a chip (“SoC”) are disclosed. Memory utilization is optimized in certain embodiments through application of customized hashing algorithms at the lower level cache of individual application clients. Advantageously, for those application clients that do not require or benefit from hashing transaction traffic their transactions are not subjected to hashing. For those application clients that do benefit from hashing transaction traffic in order to minimize page conflicts at a double data rate (“DDR”) memory device, each client further benefits from a customized, and thus optimized, hashing algorithm. Because transaction streams arrive at the memory controller already hashed, or purposefully unhashed, the need for validating clients during a development phase is minimized.
Abstract:
Systems, methods, and computer programs are disclosed for scheduling tasks in a heterogeneous processor cluster architecture in a portable computing device. One embodiment is a system comprising a first processor cluster and a second processor cluster. The first processor cluster comprises a first shared cache, and the second processor cluster comprises a second shared cache. The system further comprises a controller in communication with the first and second processor clusters for performing task migration between the first and second processor clusters. The controller initiates execution of a task on a first processor in the first processor cluster. The controller monitors a processor workload for the first processor and a cache demand associated with the first shared cache while the task is running on the first processor in the first processor cluster. The controller migrates the task to the second processor cluster based on the processor workload and the cache demand.
Abstract:
Various embodiments of methods and systems for thermally aware scheduling of workloads in a portable computing device that contains a heterogeneous, multi-processor system on a chip (“SoC”) are disclosed. Because individual processing components in a heterogeneous, multi-processor SoC may exhibit different processing efficiencies at a given temperature, and because more than one of the processing components may be capable of processing a given block of code, thermally aware workload scheduling techniques that compare performance curves of the individual processing components at their measured operating temperatures can be leveraged to optimize quality of service (“QoS”) by allocating workloads in real time, or near real time, to the processing components best positioned to efficiently process the block of code.
Abstract:
Methods, systems and devices that include a dynamic clock and voltage scaling (DCVS) solution configured to compute and enforce performance guarantees to ensure that a processor does not remain in a busy state (e.g., due to transient workloads) for more than a predetermined amount of time above that which is required for that processor to complete its pre-computed steady state workload. The DCVS may adjust the frequency and/or voltage of a processor based on a variable delay to ensure that the processing core only falls behind its steady state workload by, at most, a predefined maximum amount of work, irrespective of the operating frequency or voltage of the processor.