-
631.
公开(公告)号:US10459850B2
公开(公告)日:2019-10-29
申请号:US15270231
申请日:2016-09-20
Applicant: Advanced Micro Devices, Inc.
Inventor: David A. Kaplan
IPC: G06F12/14 , G06F9/355 , G06F12/1009 , G06F9/455 , G06F21/62
Abstract: Systems, apparatuses, and methods for implementing virtualized process isolation are disclosed. A system includes a kernel and multiple guest virtual machines (VMs) executing on the system's processing hardware. Each guest VM includes a vShim layer for managing kernel accesses to user space and guest accesses to kernel space. The vShim layer also maintains a set of page tables separate from the kernel page tables. In one embodiment, data in the user space is encrypted and the kernel goes through the vShim layer to access user space data. When the kernel attempts to access a user space address, the kernel exits and the vShim layer is launched to process the request. If the kernel has permission to access the user space address, the vShim layer copies the data to a region in kernel space and then returns execution to the kernel. The vShim layer prevents the kernel from accessing the user space address if the kernel does not have permission to access the user space address. In one embodiment, the kernel space is unencrypted and the user space is encrypted. A state of a guest VM and the vShim layer may be stored in virtual machine control blocks (VMCBs) when exiting the guest VM or vShim layer.
-
公开(公告)号:US10459726B2
公开(公告)日:2019-10-29
申请号:US15822515
申请日:2017-11-27
Applicant: Advanced Micro Devices, Inc.
Inventor: John M. King
IPC: G06F9/312 , G06F9/46 , G06F12/00 , G06F7/57 , G06F9/30 , G06F9/38 , G06F9/48 , G06F12/0875 , G06F8/41
Abstract: Described herein is a system and method for store fusion that fuses small store operations into fewer, larger store operations. The system detects that a pair of adjacent operations are consecutive store operations, where the adjacent micro-operations refers to micro-operations flowing through adjacent dispatch slots and the consecutive store micro-operations refers to both of the adjacent micro-operations being store micro-operations. The consecutive store operations are then reviewed to determine if the data sizes are the same and if the store operation addresses are consecutive. The two store operations are then fused together to form one store operation with twice the data size and one store data HI operation.
-
公开(公告)号:US20190325305A1
公开(公告)日:2019-10-24
申请号:US16117302
申请日:2018-08-30
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Lei Zhang , Sateesh Lagudu , Allen Rush
Abstract: Systems, apparatuses, and methods for adaptively mapping a machine learning model to a multi-core inference accelerator engine are disclosed. A computing system includes a multi-core inference accelerator engine with multiple inference cores coupled to a memory subsystem. The system also includes a control unit which determines how to adaptively map a machine learning model to the multi-core inference accelerator engine. In one implementation, the control unit selects a mapping scheme which minimizes the memory bandwidth utilization of the multi-core inference accelerator engine. In one implementation, this mapping scheme involves having one inference core of the multi-core inference accelerator engine fetch given data and broadcast the given data to other inference cores of the inference accelerator engine. Each inference core fetches second data unique to the respective inference core. The inference cores then perform computations on the first and second data in order to implement the machine learning model.
-
公开(公告)号:US10452437B2
公开(公告)日:2019-10-22
申请号:US15192784
申请日:2016-06-24
Applicant: Advanced Micro Devices, Inc.
Inventor: Abhinandan Majumdar , Brian J. Kocoloski , Leonardo Piga , Wei Huang , Yasuko Eckert
Abstract: Systems, apparatuses, and methods for performing temperature-aware task scheduling and proactive power management. A SoC includes a plurality of processing units and a task queue storing pending tasks. The SoC calculates a thermal metric for each pending task to predict an amount of heat the pending task will generate. The SoC also determines a thermal gradient for each processing unit to predict a rate at which the processing unit's temperature will change when executing a task. The SoC also monitors a thermal margin of how far each processing unit is from reaching its thermal limit. The SoC minimizes non-uniform heat generation on the SoC by scheduling pending tasks from the task queue to the processing units based on the thermal metrics for the pending tasks, the thermal gradients of each processing unit, and the thermal margin available on each processing unit.
-
公开(公告)号:US20190319891A1
公开(公告)日:2019-10-17
申请号:US15951844
申请日:2018-04-12
Applicant: Advanced Micro Devices, Inc.
Inventor: Alan Dodson Smith , Vydhyanathan Kalyanasundharam , Bryan P. Broussard , Greggory D. Donley , Chintan S. Patel
IPC: H04L12/873 , H04L12/877 , H04L12/841 , H04L12/875
Abstract: A computing system uses a memory for storing data, one or more clients for generating network traffic and a communication fabric with network switches. The network switches include centralized storage structures, rather than separate input and output storage structures. The network switches store particular metadata corresponding to received packets in a single, centralized collapsing queue where the age of the packets corresponds to a queue entry position. The payload data of the packets are stored in a separate memory, so the relatively large amount of data is not shifted during the lifetime of the packet in the network switch. The network switches select sparse queue entries in the collapsible queue, deallocate the selected queue entries, and shift remaining allocated queue entries toward a first end of the queue with a delay proportional to the radix of the network switches.
-
公开(公告)号:US20190318229A1
公开(公告)日:2019-10-17
申请号:US15952131
申请日:2018-04-12
Applicant: Advanced Micro Devices, Inc.
Inventor: Shuai Che
Abstract: Methods and systems for hardware mapping inference pipelines in deep neural network (DNN) systems. Each layer of the inference pipeline is mapped to a queue, which in turn is associated with one or more processing elements. Each queue has multiple elements, where an element represents the task to be completed for a given input. Each input is associated with a queue packet which identifies, for example, a type of DNN layer, which DNN layer to use, a next DNN layer to use and a data pointer. A queue packet is written into the element of a queue, and the processing elements read the element and process the input based on the information in the queue packet. The processing element then writes another queue packet to another queue based on the processed queue packet. Multiple inputs can be processed in parallel and on-the-fly using the queues independent of layer starting points.
-
637.
公开(公告)号:US10437736B2
公开(公告)日:2019-10-08
申请号:US15852442
申请日:2017-12-22
Applicant: Advanced Micro Devices, Inc.
Inventor: Arkaprava Basu , Eric Van Tassell , Mark Oskin , Guilherme Cox , Gabriel Loh
IPC: G06F12/00 , G06F12/1009 , G06F12/1027 , G06F9/38 , G06F13/40 , G06F9/48 , G06F13/42
Abstract: A data processing system includes a memory and an input output memory management unit that is connected to the memory. The input output memory management unit is adapted to receive batches of address translation requests. The input output memory management unit has instructions that identify, from among the batches of address translation requests, a later batch having a lower number of memory access requests than an earlier batch, and selectively schedules access to a page table walker for each address translation request of a batch.
-
公开(公告)号:US10431562B1
公开(公告)日:2019-10-01
申请号:US16260794
申请日:2019-01-29
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Thomas P. Dolbear , Daniel Cavasin , Sanjay Dandia
IPC: H01L21/78 , H01L23/00 , H01L23/367 , H01L25/00 , H01L21/48 , H01L25/065 , C23C16/34 , C23C16/06
Abstract: An integrated circuit device wafer includes a silicon wafer substrate and a back side metallization structure. The back side metallization structure includes a first adhesion layer on the back side of the substrate, a first metal later over the first adhesion layer, a second metal layer over the first metal layer, and a second adhesion layer over the second metal layer. The first includes at least one of: silicon nitride and silicon dioxide. The first metal layer includes titanium. The second metal layer includes nickel. The second adhesion layer includes at least one of: silver, gold, and tin.
-
公开(公告)号:US10430343B2
公开(公告)日:2019-10-01
申请号:US15437843
申请日:2017-02-21
Applicant: Advanced Micro Devices, Inc.
Inventor: Patrick N. Conway
IPC: G06F12/128 , G06F12/0811 , G06F12/0815 , G06F12/0888
Abstract: A communication bypass mechanism accelerates cache-to-cache data transfers for communication traffic between caching agents that have separate last-level caches. A method includes bypassing a last-level cache of a first caching agent in response to a cache line having a modified state being evicted from a penultimate-level cache of the first caching agent and a communication attribute of a shadow tag entry associated with the cache line being set. The communication attribute indicates prior communication of the cache line with a second caching agent having a second last-level cache.
-
公开(公告)号:US10425089B2
公开(公告)日:2019-09-24
申请号:US15850593
申请日:2017-12-21
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Stephen V. Kosonocky , Mikhail Rodionov , Joyce C. Wong
Abstract: A master/slave configuration of a frequency locked Loop (FLL) decouples the process, target voltage, temperature (PVT) tracking goals of locking the loop from adapting the clock frequency in response to voltage droops in the supply. A master oscillator circuit receives a regulated supply voltage and supplies a master oscillator signal. A control circuit supplies a master frequency control signal to control a frequency of the master oscillator signal to a target frequency. A slave oscillator circuit is coupled to a regulated supply voltage and a droopy supply voltage and supplies a slave oscillator signal having a frequency responsive to a slave frequency control signal that is based on the master frequency control signal. The frequency of the second oscillator signal is further responsive to a voltage change of the droopy supply voltage.
-
-
-
-
-
-
-
-
-