I/O WRITES WITH CACHE STEERING
    81.
    发明申请

    公开(公告)号:WO2019108284A1

    公开(公告)日:2019-06-06

    申请号:PCT/US2018/048187

    申请日:2018-08-27

    Abstract: A method for steering data for an I/O write operation (144) includes, in response to receiving the I/O write operation, identifying, at an interconnect fabric (102), a cache (122, 123, 124, 126) as a target cache for steering the data based on at least one of: a software-provided steering indicator, a steering configuration (156) implemented at boot initialization, and coherency information for a cacheline associated with the data. The method further includes directing the identified target cache to cache the data from the I/O write operation. The data is temporarily buffered at the interconnect fabric, and if the target cache attempts to fetch the data via a fetch operation (152) while the data is still buffered at the interconnect fabric, the interconnect fabric provides a copy of the buffered data in response to the fetch operation instead of initiating a memory access operation to obtain the data from memory.

    SYSTEM AND METHOD FOR STORE FUSION
    82.
    发明申请

    公开(公告)号:WO2019103776A1

    公开(公告)日:2019-05-31

    申请号:PCT/US2018/048396

    申请日:2018-08-28

    Inventor: KING, John M.

    Abstract: Described herein is a system and method for store fusion that fuses small store operations into fewer, larger store operations. The system detects that a pair of adjacent operations are consecutive store operations, where the adjacent micro-operations refers to micro-operations flowing through adjacent dispatch slots and the consecutive store micro-operations refers to both of the adjacent micro-operations being store micro-operations. The consecutive store operations are then reviewed to determine if the data sizes are the same and if the store operation addresses are consecutive. The two store operations are then fused together to form one store operation with twice the data size and one store data HI operation.

    MEMORY BANDWIDTH REDUCTION TECHNIQUES FOR LOW POWER CONVOLUTIONAL NEURAL NETWORK INFERENCE APPLICATIONS

    公开(公告)号:WO2019099104A1

    公开(公告)日:2019-05-23

    申请号:PCT/US2018/052358

    申请日:2018-09-24

    CPC classification number: G06N3/08 G06F1/3296 G06N3/0454 G06N3/063

    Abstract: Systems, apparatuses, and methods for implementing memory bandwidth reduction techniques for low power convolutional neural network inference applications are disclosed. A system includes at least a processing unit and an external memory coupled to the processing unit. The system detects a request to perform a convolution operation on input data from a plurality of channels. Responsive to detecting the request, the system partitions the input data from the plurality of channels into 3D blocks so as to minimize the external memory bandwidth utilization for the convolution operation being performed. Next, the system loads a selected 3D block from external memory into internal memory and then generates convolution output data for the selected 3D block for one or more features. Then, for each feature, the system adds convolution output data together across channels prior to writing the convolution output data to the external memory.

    SWIZZLING IN 3D STACKED MEMORY
    84.
    发明申请

    公开(公告)号:WO2019083642A1

    公开(公告)日:2019-05-02

    申请号:PCT/US2018/051592

    申请日:2018-09-18

    Abstract: A processing system [100] includes a compute die [102] and a stacked memory [104] stacked with the compute die. The stacked memory includes a first memory die [104B] and a second memory die [104A] stacked on top of the first memory die. A parallel access using a single memory address is directed towards different memory banks [206, 208] of the first memory die and the second memory die. The single memory address of the parallel access is swizzled to access the first memory die and the second memory die at different physical locations.

    SHAREABLE FPGA COMPUTE ENGINE
    85.
    发明申请

    公开(公告)号:WO2019027554A1

    公开(公告)日:2019-02-07

    申请号:PCT/US2018/035377

    申请日:2018-05-31

    Abstract: Systems, apparatuses, and methods for sharing an field programmable gate array compute engine are disclosed. A system includes one or more processors and one or more FPGAs. The system receives a request, generated by a first user process, to allocate a portion of processing resources on a first FPGA. The system maps the portion of processing resources of the first FPGA into an address space of the first user process. The system prevents other user processes from accessing the portion of processing resources of the first FPGA. Later, the system detects a release of the portion of the processing resources on the first FPGA by the first user process. Then, the system receives a second request to allocate the first FPGA from a second user process. In response to the second request, the system maps the first FPGA into an address space of the second user process.

    EARLY VIRTUALIZATION CONTEXT SWITCH FOR VIRTUALIZED ACCELERATED PROCESSING DEVICE

    公开(公告)号:WO2019005485A1

    公开(公告)日:2019-01-03

    申请号:PCT/US2018/037341

    申请日:2018-06-13

    Abstract: A technique for efficient time-division of resources in a virtualized accelerated processing device ("APD") is provided. In a virtualization scheme implemented on the APD, different virtual machines are assigned different "time-slices" in which to use the APD. When a time-slice expires, the APD performs a virtualization context switch by stopping operations for a current virtual machine ("VM") and starting operations for another VM. Typically, each VM is assigned a fixed length of time, after which a virtualization context switch is performed. This fixed length of time can lead to inefficiencies. Therefore, in some situations, in response to a VM having no more work to perform on the APD and the APD being idle, a virtualization context switch is performed "early." This virtualization context switch is "early" in the sense that the virtualization context switch is performed before the fixed length of time for the time-slice expires.

    VERTICAL GATE ALL AROUND LIBRARY ARCHITECTURE

    公开(公告)号:WO2018204173A1

    公开(公告)日:2018-11-08

    申请号:PCT/US2018/029716

    申请日:2018-04-27

    Abstract: A system and method for creating a layout for a vertical gate all around standard cell are described. Metal gate is placed all around two vertical nanowire sheets formed on a silicon substrate. A gate contact is formed on the metal gate between the two vertical nanowire sheets. Gate extension metal (GEM) is placed above the metal gate at least on the gate contact. A via for a gate is formed at a location on the GEM where a local interconnect layer is available to be used for routing a gate connection. Local metal layers are placed for connecting local routes and power connections.

    SILENT ACTIVE PAGE MIGRATION FAULTS
    88.
    发明申请

    公开(公告)号:WO2018200559A1

    公开(公告)日:2018-11-01

    申请号:PCT/US2018/029188

    申请日:2018-04-24

    Abstract: Systems, apparatuses, and methods for migrating memory pages are disclosed herein. In response to detecting that a migration of a first page between memory locations is being initiated, a first page table entry (PTE) corresponding to the first page is located and a migration pending indication is stored in the first PTE. In one embodiment, the migration pending indication is encoded in the first PTE by disabling read and write permissions. If a translation request targeting the first PTE is received by the MMU and the translation request corresponds to a read request, a read operation is allowed to the first page. Otherwise, if the translation request corresponds to a write request, a write operation to the first page is blocked and a silent retry request is generated and conveyed to the requesting client.

    MONITORING OF MEMORY PAGE TRANSITIONS BETWEEN A HYPERVISOR AND A VIRTUAL MACHINE

    公开(公告)号:WO2018182772A1

    公开(公告)日:2018-10-04

    申请号:PCT/US2017/048471

    申请日:2017-08-24

    Abstract: A security module [130] in a memory access path of a processor [102] of a processing system [100] protects secure information by verifying the contents of memory pages as they transition between one or more virtual machines (VMs) [150. 151] executing at the processor and a hypervisor [152] that provides an interface between the VMs and the processing system's hardware. The security module of the processor is employed to monitor memory pages as they transition between one or more VMs and a hypervisor so that memory pages that have been altered by a hypervisor or other VM cannot be returned to the VM from which they were transitioned.

    STEREO RENDERING
    90.
    发明申请
    STEREO RENDERING 审中-公开

    公开(公告)号:WO2018140223A1

    公开(公告)日:2018-08-02

    申请号:PCT/US2018/012851

    申请日:2018-01-08

    CPC classification number: H04N13/275 G06T1/20 G06T15/10

    Abstract: Techniques for generating a stereo image from a single set of input geometry in a three-dimensional rendering pipeline are disclosed. Vertices are processed through the end of the world-space pipeline. In the primitive assembler, at the end of the world-space pipeline, before perspective division, each clip-space vertex is duplicated. The primitive assembler generates this duplicated clip-space vertex using the y, z, and w coordinates of the original vertex and based on an x coordinate that is offset in the x-direction in clip-space as compared with the x coordinate of the original vertex. Both the original vertex clip-space vertex and the modified clip-space vertex are then sent through the rest of the pipeline for processing, including perspective division, viewport transform, rasterization, pixel shading, and other operations. The result is that a single set of input vertices is rendered into a stereo image.

Patent Agency Ranking