-
公开(公告)号:WO2019108284A1
公开(公告)日:2019-06-06
申请号:PCT/US2018/048187
申请日:2018-08-27
Applicant: ADVANCED MICRO DEVICES, INC. , ATI TECHNOLOGIES ULC
Inventor: MORTON, Eric Christopher , COOPER, Elizabeth , WALKER, William L. , HUNT, Douglas Benson , BORN, Richard Martin , LEE, Richard H. , MIRANDA, Paul C. , NG, Philip
IPC: G06F12/0815 , G06F12/0862
Abstract: A method for steering data for an I/O write operation (144) includes, in response to receiving the I/O write operation, identifying, at an interconnect fabric (102), a cache (122, 123, 124, 126) as a target cache for steering the data based on at least one of: a software-provided steering indicator, a steering configuration (156) implemented at boot initialization, and coherency information for a cacheline associated with the data. The method further includes directing the identified target cache to cache the data from the I/O write operation. The data is temporarily buffered at the interconnect fabric, and if the target cache attempts to fetch the data via a fetch operation (152) while the data is still buffered at the interconnect fabric, the interconnect fabric provides a copy of the buffered data in response to the fetch operation instead of initiating a memory access operation to obtain the data from memory.
-
公开(公告)号:WO2019103776A1
公开(公告)日:2019-05-31
申请号:PCT/US2018/048396
申请日:2018-08-28
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: KING, John M.
IPC: G06F9/30
Abstract: Described herein is a system and method for store fusion that fuses small store operations into fewer, larger store operations. The system detects that a pair of adjacent operations are consecutive store operations, where the adjacent micro-operations refers to micro-operations flowing through adjacent dispatch slots and the consecutive store micro-operations refers to both of the adjacent micro-operations being store micro-operations. The consecutive store operations are then reviewed to determine if the data sizes are the same and if the store operation addresses are consecutive. The two store operations are then fused together to form one store operation with twice the data size and one store data HI operation.
-
83.
公开(公告)号:WO2019099104A1
公开(公告)日:2019-05-23
申请号:PCT/US2018/052358
申请日:2018-09-24
Applicant: ADVANCED MICRO DEVICES, INC. , ATI TECHNOLOGIES ULC
Inventor: LAGUDU, Sateesh , ZHANG, Lei , RUSH, Allen
CPC classification number: G06N3/08 , G06F1/3296 , G06N3/0454 , G06N3/063
Abstract: Systems, apparatuses, and methods for implementing memory bandwidth reduction techniques for low power convolutional neural network inference applications are disclosed. A system includes at least a processing unit and an external memory coupled to the processing unit. The system detects a request to perform a convolution operation on input data from a plurality of channels. Responsive to detecting the request, the system partitions the input data from the plurality of channels into 3D blocks so as to minimize the external memory bandwidth utilization for the convolution operation being performed. Next, the system loads a selected 3D block from external memory into internal memory and then generates convolution output data for the selected 3D block for one or more features. Then, for each feature, the system adds convolution output data together across channels prior to writing the convolution output data to the external memory.
-
公开(公告)号:WO2019083642A1
公开(公告)日:2019-05-02
申请号:PCT/US2018/051592
申请日:2018-09-18
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: WUU, John , CIRAULA, Michael K. , SCHREIBER, Russell , NAFFZIGER, Samuel
Abstract: A processing system [100] includes a compute die [102] and a stacked memory [104] stacked with the compute die. The stacked memory includes a first memory die [104B] and a second memory die [104A] stacked on top of the first memory die. A parallel access using a single memory address is directed towards different memory banks [206, 208] of the first memory die and the second memory die. The single memory address of the parallel access is swizzled to access the first memory die and the second memory die at different physical locations.
-
公开(公告)号:WO2019027554A1
公开(公告)日:2019-02-07
申请号:PCT/US2018/035377
申请日:2018-05-31
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: KEGEL, Andrew G. , ROBERTS, David A.
IPC: G06F9/50
Abstract: Systems, apparatuses, and methods for sharing an field programmable gate array compute engine are disclosed. A system includes one or more processors and one or more FPGAs. The system receives a request, generated by a first user process, to allocate a portion of processing resources on a first FPGA. The system maps the portion of processing resources of the first FPGA into an address space of the first user process. The system prevents other user processes from accessing the portion of processing resources of the first FPGA. Later, the system detects a release of the portion of the processing resources on the first FPGA by the first user process. Then, the system receives a second request to allocate the first FPGA from a second user process. In response to the second request, the system maps the first FPGA into an address space of the second user process.
-
公开(公告)号:WO2019005485A1
公开(公告)日:2019-01-03
申请号:PCT/US2018/037341
申请日:2018-06-13
Applicant: ADVANCED MICRO DEVICES, INC. , ATI TECHNOLOGIES ULC
Inventor: CHENG, Gongxian Jeffrey , REGNIERE, Louis , ASARO, Anthony
IPC: G06F9/455
Abstract: A technique for efficient time-division of resources in a virtualized accelerated processing device ("APD") is provided. In a virtualization scheme implemented on the APD, different virtual machines are assigned different "time-slices" in which to use the APD. When a time-slice expires, the APD performs a virtualization context switch by stopping operations for a current virtual machine ("VM") and starting operations for another VM. Typically, each VM is assigned a fixed length of time, after which a virtualization context switch is performed. This fixed length of time can lead to inefficiencies. Therefore, in some situations, in response to a VM having no more work to perform on the APD and the APD being idle, a virtualization context switch is performed "early." This virtualization context switch is "early" in the sense that the virtualization context switch is performed before the fixed length of time for the time-slice expires.
-
公开(公告)号:WO2018204173A1
公开(公告)日:2018-11-08
申请号:PCT/US2018/029716
申请日:2018-04-27
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: SCHULTZ, Richard T.
IPC: H01L27/02 , H01L29/775 , H01L29/06 , H01L21/768 , H01L21/8238 , H01L29/66 , H01L27/118
Abstract: A system and method for creating a layout for a vertical gate all around standard cell are described. Metal gate is placed all around two vertical nanowire sheets formed on a silicon substrate. A gate contact is formed on the metal gate between the two vertical nanowire sheets. Gate extension metal (GEM) is placed above the metal gate at least on the gate contact. A via for a gate is formed at a location on the GEM where a local interconnect layer is available to be used for routing a gate connection. Local metal layers are placed for connecting local routes and power connections.
-
公开(公告)号:WO2018200559A1
公开(公告)日:2018-11-01
申请号:PCT/US2018/029188
申请日:2018-04-24
Applicant: ADVANCED MICRO DEVICES, INC , ATI TECHNOLOGIES ULC
Inventor: SMITH, Wade K. , ASARO, Anthony
IPC: G06F12/1009
Abstract: Systems, apparatuses, and methods for migrating memory pages are disclosed herein. In response to detecting that a migration of a first page between memory locations is being initiated, a first page table entry (PTE) corresponding to the first page is located and a migration pending indication is stored in the first PTE. In one embodiment, the migration pending indication is encoded in the first PTE by disabling read and write permissions. If a translation request targeting the first PTE is received by the MMU and the translation request corresponds to a read request, a read operation is allowed to the first page. Otherwise, if the translation request corresponds to a write request, a write operation to the first page is blocked and a silent retry request is generated and conveyed to the requesting client.
-
公开(公告)号:WO2018182772A1
公开(公告)日:2018-10-04
申请号:PCT/US2017/048471
申请日:2017-08-24
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: KAPLAN, David , POWELL, Jeremy W. , RELPH, Richard
Abstract: A security module [130] in a memory access path of a processor [102] of a processing system [100] protects secure information by verifying the contents of memory pages as they transition between one or more virtual machines (VMs) [150. 151] executing at the processor and a hypervisor [152] that provides an interface between the VMs and the processing system's hardware. The security module of the processor is employed to monitor memory pages as they transition between one or more VMs and a hypervisor so that memory pages that have been altered by a hypervisor or other VM cannot be returned to the VM from which they were transitioned.
-
公开(公告)号:WO2018140223A1
公开(公告)日:2018-08-02
申请号:PCT/US2018/012851
申请日:2018-01-08
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: NIJASURE, Mangesh, P. , MANTOR, Michael , SMITH, Jeffrey, M.
CPC classification number: H04N13/275 , G06T1/20 , G06T15/10
Abstract: Techniques for generating a stereo image from a single set of input geometry in a three-dimensional rendering pipeline are disclosed. Vertices are processed through the end of the world-space pipeline. In the primitive assembler, at the end of the world-space pipeline, before perspective division, each clip-space vertex is duplicated. The primitive assembler generates this duplicated clip-space vertex using the y, z, and w coordinates of the original vertex and based on an x coordinate that is offset in the x-direction in clip-space as compared with the x coordinate of the original vertex. Both the original vertex clip-space vertex and the modified clip-space vertex are then sent through the rest of the pipeline for processing, including perspective division, viewport transform, rasterization, pixel shading, and other operations. The result is that a single set of input vertices is rendered into a stereo image.
-
-
-
-
-
-
-
-
-