Network of memory modules with logarithmic access

    公开(公告)号:US10394726B2

    公开(公告)日:2019-08-27

    申请号:US15229708

    申请日:2016-08-05

    Inventor: Gabriel Loh

    Abstract: A memory network includes a plurality of memory nodes each identifiable by an ordinal number m, and a set of links divided into N subsets of links, where each subset of links is identifiable by an ordinal number n. For each subset of the plurality of N subsets of links, each link in the subset connects two memory nodes that have ordinal numbers m differing by b(n-1), where b is a positive number. Each of the memory nodes is communicatively coupled to a processor via at least two non-overlapping pathways through the plurality of links.

    ACCURATE ON-CHIP TEMPERATURE SENSING USING THERMAL OSCILLATOR

    公开(公告)号:US20190257696A1

    公开(公告)日:2019-08-22

    申请号:US15902101

    申请日:2018-02-22

    Abstract: A calibrated temperature sensor includes a power on oscillator responsive to a calibration enable signal for providing a power on clock signal, a temperature dependent oscillator responsive to said calibration enable signal for providing a temperature dependent clock signal, and a measurement logic circuit. The measurement logic circuit counts a first number of pulses of the temperature dependent clock signal during a first calibration period using the power on clock signal, a second number of pulses of the temperature dependent clock signal during a second calibration period using a system clock signal, and a third number of pulses of the power on clock signal over a third calibration period using the system clock signal, and a fourth number of pulses of the temperature dependent clock signal using the system clock signal during a normal operation mode, wherein the first calibration period precedes both the second and third calibration periods.

    Silent active page migration faults
    615.
    发明授权

    公开(公告)号:US10365824B2

    公开(公告)日:2019-07-30

    申请号:US15495296

    申请日:2017-04-24

    Abstract: Systems, apparatuses, and methods for migrating memory pages are disclosed herein. In response to detecting that a migration of a first page between memory locations is being initiated, a first page table entry (PTE) corresponding to the first page is located and a migration pending indication is stored in the first PTE. In one embodiment, the migration pending indication is encoded in the first PTE by disabling read and write permissions. If a translation request targeting the first PTE is received by the MMU and the translation request corresponds to a read request, a read operation is allowed to the first page. Otherwise, if the translation request corresponds to a write request, a write operation to the first page is blocked and a silent retry request is generated and conveyed to the requesting client.

    ADAPTIVE DCO VF CURVE SLOPE CONTROL
    616.
    发明申请

    公开(公告)号:US20190229736A1

    公开(公告)日:2019-07-25

    申请号:US16370479

    申请日:2019-03-29

    Abstract: An oscillator circuit is provided that adapts to voltage supply variations. The circuit first and second delays lines connected inputs of an edge detector, one delay line supplied by a reference voltage and the other with a drooping supply voltage. The edge detector generates an output clock based on a relationship between the inputs. The output clock applied to the signal inputs of the first and second delay lines. The output clock has a voltage dependent frequency performance curve with a slope dependent at least on the second delay line delay and a delay of the edge detector. At least one of the first delay line, the second delay line, and the edge detector delay are adjusted to change the slope of the performance curve.

    Managing variations among nodes in parallel system frameworks

    公开(公告)号:US10355966B2

    公开(公告)日:2019-07-16

    申请号:US15081558

    申请日:2016-03-25

    Abstract: Systems, apparatuses, and methods for managing variations among nodes in parallel system frameworks. Sensor and performance data associated with the nodes of a multi-node cluster may be monitored to detect variations among the nodes. A variability metric may be calculated for each node of the cluster based on the sensor and performance data associated with the node. The variability metrics may then be used by a mapper to efficiently map tasks of a parallel application to the nodes of the cluster. In one embodiment, the mapper may assign the critical tasks of the parallel application to the nodes with the lowest variability metrics. In another embodiment, the hardware of the nodes may be reconfigured so as to reduce the node-to-node variability.

    Strided loading of non-sequential memory locations by skipping memory locations between consecutive loads

    公开(公告)号:US10353708B2

    公开(公告)日:2019-07-16

    申请号:US15273916

    申请日:2016-09-23

    Abstract: Systems, apparatuses, and methods for utilizing efficient vectorization techniques for operands in non-sequential memory locations are disclosed. A system includes a vector processing unit (VPU) and one or more memory devices. In response to determining that a plurality of vector operands are stored in non-sequential memory locations, the VPU performs a plurality of vector load operations to load the plurality of vector operands into a plurality of vector registers. Next, the VPU performs a shuffle operation to consolidate the plurality of vector operands from the plurality of vector registers into a single vector register. Then, the VPU performs a vector operation on the vector operands stored in the single vector register. The VPU can also perform a vector store operation by permuting and storing a plurality of vector operands in appropriate locations within multiple vector registers and then storing the vector registers to locations in memory using a mask.

    Fused shader programs
    620.
    发明授权

    公开(公告)号:US10353591B2

    公开(公告)日:2019-07-16

    申请号:US15442499

    申请日:2017-02-24

    Abstract: Improvements in compute shader programs executed on parallel processing hardware are disclosed. An application or other entity defines a sequence of shader programs to execute. Each shader program defines inputs and outputs which would, if unmodified, execute as loads and stores to a general purpose memory, incurring high latency. A compiler combines the shader programs into groups that can operate in a lower-latency, but lower-capacity local data store memory. The boundaries of these combined shader programs are defined by several aspects including where memory barrier operations are to execute, whether combinations of shader programs can execute using only the local data store and not the global memory (except for initial reads and writes) and other aspects.

Patent Agency Ranking