-
公开(公告)号:US20190199617A1
公开(公告)日:2019-06-27
申请号:US15850616
申请日:2017-12-21
Applicant: Advanced Micro Devices, Inc.
IPC: H04L12/751 , G06F17/30 , G06F13/42 , G06F13/40 , G06F13/364 , H04L12/933 , H04L12/741
CPC classification number: H04L45/02 , G06F13/364 , G06F13/4022 , G06F13/4282 , G06F16/9024 , G06F16/9038 , G06F21/575 , G06F2213/0016 , H04L45/745 , H04L49/15 , H04L63/20
Abstract: A system for automatically discovering fabric topology includes at least one or more processing units, one or more memory devices, a security processor, and a communication fabric with an unknown topology coupled to the processing unit(s), memory device(s), and security processor. The security processor queries each component of the fabric to retrieve various attributes associated with the component. The security processor utilizes the retrieved attributes to create a network graph of the topology of the components within the fabric. The security processor generates routing tables from the network graph and programs the routing tables into the fabric components. Then, the fabric components utilize the routing tables to determine how to route incoming packets.
-
公开(公告)号:US20190197761A1
公开(公告)日:2019-06-27
申请号:US15853207
申请日:2017-12-22
Applicant: Advanced Micro Devices, Inc.
Inventor: Skyler Jonathon Saleh , Maxim V. Kazakov , Vineet Goel
Abstract: A texture processor based ray tracing accelerator method and system are described. The system includes a shader, texture processor (TP) and cache, which are interconnected. The TP includes a texture address unit (TA), a texture cache processor (TCP), a filter pipeline unit and a ray intersection engine. The shader sends a texture instruction which contains ray data and a pointer to a bounded volume hierarchy (BVH) node to the TA. The TCP uses an address provided by the TA to fetch BVH node data from the cache. The ray intersection engine performs ray-BVH node type intersection testing using the ray data and the BVH node data. The intersection testing results and indications for BVH traversal are returned to the shader via a texture data return path. The shader reviews the intersection results and the indications to decide how to traverse to the next BVH node.
-
663.
公开(公告)号:US20190196978A1
公开(公告)日:2019-06-27
申请号:US15852442
申请日:2017-12-22
Applicant: Advanced Micro Devices, Inc.
Inventor: Arkaprava Basu , Eric Van Tassell , Mark Oskin , Guilherme Cox , Gabriel Loh
IPC: G06F12/1009 , G06F12/1027 , G06F9/38 , G06F13/40 , G06F13/42 , G06F9/48
CPC classification number: G06F12/1009 , G06F9/3887 , G06F9/4843 , G06F12/1027 , G06F13/4022 , G06F13/4282 , G06F2212/65 , G06F2212/68 , G06F2213/0026
Abstract: A data processing system includes a memory and an input output memory management unit that is connected to the memory. The input output memory management unit is adapted to receive batches of address translation requests. The input output memory management unit has instructions that identify, from among the batches of address translation requests, a later batch having a lower number of memory access requests than an earlier batch, and selectively schedules access to a page table walker for each address translation request of a batch.
-
公开(公告)号:US20190196974A1
公开(公告)日:2019-06-27
申请号:US15855838
申请日:2017-12-27
Applicant: Advanced Micro Devices, Inc.
Inventor: Vydhyanathan Kalyanasundharam , Kevin M. Lepak , Ganesh Balakrishnan , Ravindra N. Bhargava
IPC: G06F12/0897 , G06F12/121
Abstract: Systems, apparatuses, and methods for implementing a tag accelerator cache are disclosed. A system includes at least a data cache and a control unit coupled to the data cache via a memory controller. The control unit includes a tag accelerator cache (TAC) for caching tag blocks fetched from the data cache. The data cache is organized such that multiple tags are retrieved in a single access. This allows hiding the tag latency penalty for future accesses to neighboring tags and improves cache bandwidth. When a tag block is fetched from the data cache, the tag block is cached in the TAC. Memory requests received by the control unit first lookup the TAC before being forwarded to the data cache. Due to the presence of spatial locality in applications, the TAC can filter out a large percentage of tag accesses to the data cache, resulting in latency and bandwidth savings.
-
公开(公告)号:US20190196721A1
公开(公告)日:2019-06-27
申请号:US15851479
申请日:2017-12-21
Applicant: Advanced Micro Devices, Inc.
Inventor: James Raymond Magro
CPC classification number: G06F3/0611 , G06F3/0658 , G06F3/0659 , G06F3/0673 , G06F9/5011 , G06F12/10 , G06F13/1626 , G06F13/1631 , G06F13/1684 , G06F2212/1024
Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. A computing system includes one or more clients for processing applications. A memory controller transfers traffic between the memory controller and two channels, each connected to a memory device. A client sends a 64-byte memory request with an indication specifying that there are two 32-byte requests targeting non-contiguous data within a same page. The memory controller generates two addresses, and sends a single command and the two addresses to two channels to simultaneously access non-contiguous data in a same page.
-
公开(公告)号:US10331196B2
公开(公告)日:2019-06-25
申请号:US15626847
申请日:2017-06-19
Applicant: Advanced Micro Devices, Inc.
Inventor: Russell Schreiber
IPC: G06F1/324 , H03K5/159 , G06F1/3296 , G06F1/3287
Abstract: A system and method for providing efficient clock gating capability for functional units are described. A functional unit uses a clock gating circuit for power management. A setup time of a single device propagation delay is provided for a received enable signal. When each of a clock signal, the enable signal and a delayed clock signal is asserted, an evaluate node of the clock gating circuit is discharged. When each of the clock signal and a second clock signal is asserted and the enable signal is negated, the evaluate node is left floating for a duration equal to the hold time. Afterward, the devices in a delayed onset keeper are turned on and the evaluate node has a path to the power supply. When the clock signal is negated, the evaluate node is precharged.
-
公开(公告)号:US20190190805A1
公开(公告)日:2019-06-20
申请号:US15849266
申请日:2017-12-20
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Douglas Benson HUNT , Jay FLEISCHMAN
Abstract: A system includes a multi-core processor that includes a scheduler. The multi-core processor communicates with a system memory and an operating system. The multi-core processor executes a first process and a second process. The system uses the scheduler to control a use of a memory bandwidth by the second process until a current use in a control cycle by the first process meets a first setpoint of use for the first process when the first setpoint is at or below a latency sensitive (LS) floor or a current use in the control cycle by the first process exceeds the LS floor when the first setpoint exceeds the LS floor.
-
公开(公告)号:US20190188557A1
公开(公告)日:2019-06-20
申请号:US15849617
申请日:2017-12-20
Applicant: Advanced Micro Devices, Inc.
Inventor: Daniel I. Lowell , Sergey Voronov , Mayank Daga
Abstract: Methods, devices, systems, and instructions for adaptive quantization in an artificial neural network (ANN) calculate a distribution of ANN information; select a quantization function from a set of quantization functions based on the distribution; apply the quantization function to the ANN information to generate quantized ANN information; load the quantized ANN information into the ANN; and generate an output based on the quantized ANN information. Some examples recalculate the distribution of ANN information and reselect the quantization function from the set of quantization functions based on the resampled distribution if the output does not sufficiently correlate with a known correct output. In some examples, the ANN information includes a set of training data. In some examples, the ANN information includes a plurality of link weights.
-
公开(公告)号:US20190188137A1
公开(公告)日:2019-06-20
申请号:US15846008
申请日:2017-12-18
Applicant: Advanced Micro Devices, Inc.
Inventor: Vydhyanathan Kalyanasundharam , Kevin M. Lepak , Amit P. Apte , Ganesh Balakrishnan , Eric Christopher Morton , Elizabeth M. Cooper , Ravindra N. Bhargava
IPC: G06F12/0817 , G06F12/128 , G06F12/0811 , G06F12/0831 , G06F12/0871
CPC classification number: G06F12/0817 , G06F12/0811 , G06F12/0831 , G06F12/0871 , G06F12/128 , G06F2212/283 , G06F2212/604 , G06F2212/621
Abstract: Systems, apparatuses, and methods for maintaining a region-based cache directory are disclosed. A system includes multiple processing nodes, with each processing node including a cache subsystem. The system also includes a cache directory to help manage cache coherency among the different cache subsystems of the system. In order to reduce the number of entries in the cache directory, the cache directory tracks coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Accordingly, the system includes a region-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the system. The cache directory includes a reference count in each entry to track the aggregate number of cache lines that are cached per region. If a reference count of a given entry goes to zero, the cache directory reclaims the given entry.
-
公开(公告)号:US20190188001A1
公开(公告)日:2019-06-20
申请号:US15846781
申请日:2017-12-19
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Amitabh MEHRA , Krishna SAI BERNUCHO
IPC: G06F9/44
CPC classification number: G06F9/4408 , G06F9/4403
Abstract: A computing device includes a processor having a plurality of cores, a core translation component, and a core assignment component. The core translation component provides a set of registers, one register for each core of the multiple processor cores. The core assignment component includes components to provide a core index to each of the registers of the core translation component according to a core assignment scheme during processor initialization. Process instructions from an operating system are transferred to a respective core based on the core indices.
-
-
-
-
-
-
-
-
-