Silent active page migration faults
    652.
    发明授权

    公开(公告)号:US10365824B2

    公开(公告)日:2019-07-30

    申请号:US15495296

    申请日:2017-04-24

    Abstract: Systems, apparatuses, and methods for migrating memory pages are disclosed herein. In response to detecting that a migration of a first page between memory locations is being initiated, a first page table entry (PTE) corresponding to the first page is located and a migration pending indication is stored in the first PTE. In one embodiment, the migration pending indication is encoded in the first PTE by disabling read and write permissions. If a translation request targeting the first PTE is received by the MMU and the translation request corresponds to a read request, a read operation is allowed to the first page. Otherwise, if the translation request corresponds to a write request, a write operation to the first page is blocked and a silent retry request is generated and conveyed to the requesting client.

    ADAPTIVE DCO VF CURVE SLOPE CONTROL
    653.
    发明申请

    公开(公告)号:US20190229736A1

    公开(公告)日:2019-07-25

    申请号:US16370479

    申请日:2019-03-29

    Abstract: An oscillator circuit is provided that adapts to voltage supply variations. The circuit first and second delays lines connected inputs of an edge detector, one delay line supplied by a reference voltage and the other with a drooping supply voltage. The edge detector generates an output clock based on a relationship between the inputs. The output clock applied to the signal inputs of the first and second delay lines. The output clock has a voltage dependent frequency performance curve with a slope dependent at least on the second delay line delay and a delay of the edge detector. At least one of the first delay line, the second delay line, and the edge detector delay are adjusted to change the slope of the performance curve.

    Managing variations among nodes in parallel system frameworks

    公开(公告)号:US10355966B2

    公开(公告)日:2019-07-16

    申请号:US15081558

    申请日:2016-03-25

    Abstract: Systems, apparatuses, and methods for managing variations among nodes in parallel system frameworks. Sensor and performance data associated with the nodes of a multi-node cluster may be monitored to detect variations among the nodes. A variability metric may be calculated for each node of the cluster based on the sensor and performance data associated with the node. The variability metrics may then be used by a mapper to efficiently map tasks of a parallel application to the nodes of the cluster. In one embodiment, the mapper may assign the critical tasks of the parallel application to the nodes with the lowest variability metrics. In another embodiment, the hardware of the nodes may be reconfigured so as to reduce the node-to-node variability.

    Strided loading of non-sequential memory locations by skipping memory locations between consecutive loads

    公开(公告)号:US10353708B2

    公开(公告)日:2019-07-16

    申请号:US15273916

    申请日:2016-09-23

    Abstract: Systems, apparatuses, and methods for utilizing efficient vectorization techniques for operands in non-sequential memory locations are disclosed. A system includes a vector processing unit (VPU) and one or more memory devices. In response to determining that a plurality of vector operands are stored in non-sequential memory locations, the VPU performs a plurality of vector load operations to load the plurality of vector operands into a plurality of vector registers. Next, the VPU performs a shuffle operation to consolidate the plurality of vector operands from the plurality of vector registers into a single vector register. Then, the VPU performs a vector operation on the vector operands stored in the single vector register. The VPU can also perform a vector store operation by permuting and storing a plurality of vector operands in appropriate locations within multiple vector registers and then storing the vector registers to locations in memory using a mask.

    Fused shader programs
    657.
    发明授权

    公开(公告)号:US10353591B2

    公开(公告)日:2019-07-16

    申请号:US15442499

    申请日:2017-02-24

    Abstract: Improvements in compute shader programs executed on parallel processing hardware are disclosed. An application or other entity defines a sequence of shader programs to execute. Each shader program defines inputs and outputs which would, if unmodified, execute as loads and stores to a general purpose memory, incurring high latency. A compiler combines the shader programs into groups that can operate in a lower-latency, but lower-capacity local data store memory. The boundaries of these combined shader programs are defined by several aspects including where memory barrier operations are to execute, whether combinations of shader programs can execute using only the local data store and not the global memory (except for initial reads and writes) and other aspects.

    SYSTEM-WIDE LOW POWER MANAGEMENT
    659.
    发明申请

    公开(公告)号:US20190204899A1

    公开(公告)日:2019-07-04

    申请号:US15856546

    申请日:2017-12-28

    CPC classification number: G06F1/3287 G06F1/3234 G06F1/3296 G06F9/5094

    Abstract: Systems, apparatuses, and methods for performing efficient power management for a multi-node computing system are disclosed. A computing system includes multiple nodes. When power down negotiation is distributed, negotiation for system-wide power down occurs within a lower level of a node hierarchy prior to negotiation for power down occurring at a higher level of the node hierarchy. When power down negotiation is centralized, a given node combines a state of its clients with indications received on its downstream link and sends an indication on an upstream link based on the combining. Only a root node sends power down requests.

Patent Agency Ranking