-
公开(公告)号:US10713054B2
公开(公告)日:2020-07-14
申请号:US16030031
申请日:2018-07-09
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Thomas Cloqueur , Anthony Jarvis
Abstract: A processor includes two or more branch target buffer (BTB) tables for branch prediction, each BTB table storing entries of a different target size or width or storing entries of a different branch type. Each BTB entry includes at least a tag and a target address. For certain branch types that only require a few target address bits, the respective BTB tables are narrower thereby allowing for more BTB entries in the processor separated into respective BTB tables by branch instruction type. An increased number of available BTB entries are stored in a same or a less space in the processor thereby increasing a speed of instruction processing. BTB tables can be defined that do not store any target address and rely on a decode unit to provide it. High value BTB entries have dedicated storage and are therefore less likely to be evicted than low value BTB entries.
-
公开(公告)号:US10705959B2
公开(公告)日:2020-07-07
申请号:US16119438
申请日:2018-08-31
Applicant: Advanced Micro Devices, Inc.
Inventor: Vydhyanathan Kalyanasundharam , Kevin M. Lepak , Amit P. Apte , Ganesh Balakrishnan
IPC: G06F12/0817
Abstract: Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.
-
公开(公告)号:US10698691B2
公开(公告)日:2020-06-30
申请号:US15252168
申请日:2016-08-30
Applicant: Advanced Micro Devices, Inc.
Inventor: Steven R. Havlir
Abstract: Disclosed are a method and a processing device directed to determining global branch history for branch prediction. The method includes shifting first bits of a branch signature into a current global branch history and performing a bitwise exclusive-or (XOR) function on second bits of the branch signature and shifted bits of the current global branch history. In this way, the current global branch history is updated. The processing device implements the method using a shift logic configured to store and shift bits representing a current global branch history, a register configured to store the current global branch history, decision circuitry configured to determine whether or not a branch is taken, and XOR gates.
-
公开(公告)号:US20200183833A1
公开(公告)日:2020-06-11
申请号:US16215298
申请日:2018-12-10
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Swapnil SAKHARSHETE , Samuel Lawrence WASMUNDT
IPC: G06F12/06 , G06F12/109 , G06F17/16 , G06T1/20
Abstract: A processing system includes a central processing unit (CPU) and a graphics processing unit (GPU) that has a plurality of compute units. The GPU receives an image from the CPU and determines a total result area in a virtual-matrix-multiplication space of a virtual matrix-multiplication output matrix based on convolutional parameters associated with the image in an image space. The GPU partitions the total result area of the virtual matrix-multiplication output matrix into a plurality of virtual segments. The GPU allocates convolution operations to the plurality of compute units based on each virtual segment of the plurality of virtual segments.
-
公开(公告)号:US20200183734A1
公开(公告)日:2020-06-11
申请号:US16211954
申请日:2018-12-06
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Milind N. NEMLEKAR
Abstract: A graphics processing unit (GPU) schedules recurrent matrix multiplication operations at different subsets of CUs of the GPU. The GPU includes a scheduler that receives sets of recurrent matrix multiplication operations, such as multiplication operations associated with a recurrent neural network (RNN). The multiple operations associated with, for example, an RNN layer are fused into a single kernel, which is scheduled by the scheduler such that one work group is assigned per compute unit, thus assigning different ones of the recurrent matrix multiplication operations to different subsets of the CUs of the GPU. In addition, via software synchronization of the different workgroups, the GPU pipelines the assigned matrix multiplication operations so that each subset of CUs provides corresponding multiplication results to a different subset, and so that each subset of CUs executes at least a portion of the multiplication operations concurrently.
-
公开(公告)号:US20200183597A1
公开(公告)日:2020-06-11
申请号:US16212388
申请日:2018-12-06
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Shomit N. DAS , Kishore PUNNIYAMURTHY
IPC: G06F3/06
Abstract: A processing system scales power to memory and memory channels based on identifying causes of stalls of threads of a wavefront. If the cause is other than an outstanding memory request, the processing system throttles power to the memory to save power. If the stall is due to memory stalls for a subset of the memory channels servicing memory access requests for threads of a wavefront, the processing system adjusts power of the memory channels servicing memory access request for the wavefront based on the subset. By boosting power to the subset of channels, the processing system enables the wavefront to complete processing more quickly, resulting in increased processing speed. Conversely, by throttling power to the remainder of channels, the processing system saves power without affecting processing speed.
-
公开(公告)号:US10679316B2
公开(公告)日:2020-06-09
申请号:US16007893
申请日:2018-06-13
Applicant: Advanced Micro Devices, Inc.
Inventor: Sean M. O'Connell
Abstract: Systems, apparatuses, and methods for implementing a single pass stipple pattern generation process are disclosed. A processor initiates parallel execution of a first and second plurality of wavefronts. A first wavefront of the first plurality of wavefronts converts a first local coordinate into a first global coordinate, wherein the first local coordinate corresponds to a first portion of a primitive. Also, a first wavefront of the second plurality of wavefronts applies a first attribute to the first global coordinate prior to a second wavefront, of the first plurality of wavefronts, converting a second local coordinate of a second portion of the primitive into a second global coordinate. The second plurality of wavefronts generate image data based on applying the first attribute to global coordinates generated by the first plurality of wavefronts, and the image data is conveyed for display on a display device.
-
公开(公告)号:US20200175329A1
公开(公告)日:2020-06-04
申请号:US16208384
申请日:2018-12-03
Applicant: Advanced Micro Devices, Inc.
Inventor: Nicholas Malaya
Abstract: A generator for generating artificial data, and training for the same. Data corresponding to a first label is altered within a reference labeled data set. A discriminator is trained based on the reference labeled data set to create a selectively poisoned discriminator. A generator is trained based on the selectively poisoned discriminator to create a selectively poisoned generator. The selectively poisoned generator is tested for the first label and tested for the second label to determine whether the generator is sufficiently poisoned for the first label and sufficiently accurate for the second label. If it is not, the generator is retrained based on the data set including the further altered data. The generator includes a first ANN to input first information and output a set of artificial data that is classifiable using a first label and not classifiable using a second label of the set of labeled data.
-
公开(公告)号:US20200174962A1
公开(公告)日:2020-06-04
申请号:US16204751
申请日:2018-11-29
Applicant: Advanced Micro Devices, Inc. , ATI Technologies ULC
Inventor: Michael J. Tresidder , Yanfeng Wang , Shiqi Sun
Abstract: A method and apparatus for physical layer bypass data transmission between physical coding sub-layers (PCS) includes encoding the data for transmission over a serial low-speed link. The data is transmitted from a first PCS via a serial connection over a serializer/deserializer (SERDES) transmission bypass path The data is received by a second PCS via a SERDES receive bypass path.
-
590.
公开(公告)号:US10671554B1
公开(公告)日:2020-06-02
申请号:US16271371
申请日:2019-02-08
Applicant: Advanced Micro Devices, Inc.
Inventor: Srikant Bharadwaj
Abstract: Flow control credit management is provided when converting traffic from a first parallel link width on a first link to a second parallel link width on a second link A current value is calculated for a variable flow control credit exchange rate (R) associated with the first and second links. A first flow control credit indicator is received on the second link, and a credit amount calculated based on the first flow control credit indicator and R. A second flow control credit indicator for the credit amount is then transmitted on the first link.
-
-
-
-
-
-
-
-
-