Multiple-table branch target buffer
    581.
    发明授权

    公开(公告)号:US10713054B2

    公开(公告)日:2020-07-14

    申请号:US16030031

    申请日:2018-07-09

    Abstract: A processor includes two or more branch target buffer (BTB) tables for branch prediction, each BTB table storing entries of a different target size or width or storing entries of a different branch type. Each BTB entry includes at least a tag and a target address. For certain branch types that only require a few target address bits, the respective BTB tables are narrower thereby allowing for more BTB entries in the processor separated into respective BTB tables by branch instruction type. An increased number of available BTB entries are stored in a same or a less space in the processor thereby increasing a speed of instruction processing. BTB tables can be defined that do not store any target address and rely on a decode unit to provide it. High value BTB entries have dedicated storage and are therefore less likely to be evicted than low value BTB entries.

    Region based split-directory scheme to adapt to large cache sizes

    公开(公告)号:US10705959B2

    公开(公告)日:2020-07-07

    申请号:US16119438

    申请日:2018-08-31

    Abstract: Systems, apparatuses, and methods for maintaining region-based cache directories split between node and memory are disclosed. The system with multiple processing nodes includes cache directories split between the nodes and memory to help manage cache coherency among the nodes' cache subsystems. In order to reduce the number of entries in the cache directories, the cache directories track coherency on a region basis rather than on a cache line basis, wherein a region includes multiple cache lines. Each processing node includes a node-based cache directory to track regions which have at least one cache line cached in any cache subsystem in the node. The node-based cache directory includes a reference count field in each entry to track the aggregate number of cache lines that are cached per region. The memory-based cache directory includes entries for regions which have an entry stored in any node-based cache directory of the system.

    VIRTUAL SPACE MEMORY BANDWIDTH REDUCTION
    584.
    发明申请

    公开(公告)号:US20200183833A1

    公开(公告)日:2020-06-11

    申请号:US16215298

    申请日:2018-12-10

    Abstract: A processing system includes a central processing unit (CPU) and a graphics processing unit (GPU) that has a plurality of compute units. The GPU receives an image from the CPU and determines a total result area in a virtual-matrix-multiplication space of a virtual matrix-multiplication output matrix based on convolutional parameters associated with the image in an image space. The GPU partitions the total result area of the virtual matrix-multiplication output matrix into a plurality of virtual segments. The GPU allocates convolution operations to the plurality of compute units based on each virtual segment of the plurality of virtual segments.

    PIPELINED MATRIX MULTIPLICATION AT A GRAPHICS PROCESSING UNIT

    公开(公告)号:US20200183734A1

    公开(公告)日:2020-06-11

    申请号:US16211954

    申请日:2018-12-06

    Abstract: A graphics processing unit (GPU) schedules recurrent matrix multiplication operations at different subsets of CUs of the GPU. The GPU includes a scheduler that receives sets of recurrent matrix multiplication operations, such as multiplication operations associated with a recurrent neural network (RNN). The multiple operations associated with, for example, an RNN layer are fused into a single kernel, which is scheduled by the scheduler such that one work group is assigned per compute unit, thus assigning different ones of the recurrent matrix multiplication operations to different subsets of the CUs of the GPU. In addition, via software synchronization of the different workgroups, the GPU pipelines the assigned matrix multiplication operations so that each subset of CUs provides corresponding multiplication results to a different subset, and so that each subset of CUs executes at least a portion of the multiplication operations concurrently.

    DYNAMIC VOLTAGE AND FREQUENCY SCALING BASED ON MEMORY CHANNEL SLACK

    公开(公告)号:US20200183597A1

    公开(公告)日:2020-06-11

    申请号:US16212388

    申请日:2018-12-06

    Abstract: A processing system scales power to memory and memory channels based on identifying causes of stalls of threads of a wavefront. If the cause is other than an outstanding memory request, the processing system throttles power to the memory to save power. If the stall is due to memory stalls for a subset of the memory channels servicing memory access requests for threads of a wavefront, the processing system adjusts power of the memory channels servicing memory access request for the wavefront based on the subset. By boosting power to the subset of channels, the processing system enables the wavefront to complete processing more quickly, resulting in increased processing speed. Conversely, by throttling power to the remainder of channels, the processing system saves power without affecting processing speed.

    Single pass prefix sum in a vertex shader

    公开(公告)号:US10679316B2

    公开(公告)日:2020-06-09

    申请号:US16007893

    申请日:2018-06-13

    Abstract: Systems, apparatuses, and methods for implementing a single pass stipple pattern generation process are disclosed. A processor initiates parallel execution of a first and second plurality of wavefronts. A first wavefront of the first plurality of wavefronts converts a first local coordinate into a first global coordinate, wherein the first local coordinate corresponds to a first portion of a primitive. Also, a first wavefront of the second plurality of wavefronts applies a first attribute to the first global coordinate prior to a second wavefront, of the first plurality of wavefronts, converting a second local coordinate of a second portion of the primitive into a second global coordinate. The second plurality of wavefronts generate image data based on applying the first attribute to global coordinates generated by the first plurality of wavefronts, and the image data is conveyed for display on a display device.

    DELIBERATE CONDITIONAL POISON TRAINING FOR GENERATIVE MODELS

    公开(公告)号:US20200175329A1

    公开(公告)日:2020-06-04

    申请号:US16208384

    申请日:2018-12-03

    Inventor: Nicholas Malaya

    Abstract: A generator for generating artificial data, and training for the same. Data corresponding to a first label is altered within a reference labeled data set. A discriminator is trained based on the reference labeled data set to create a selectively poisoned discriminator. A generator is trained based on the selectively poisoned discriminator to create a selectively poisoned generator. The selectively poisoned generator is tested for the first label and tested for the second label to determine whether the generator is sufficiently poisoned for the first label and sufficiently accurate for the second label. If it is not, the generator is retrained based on the data set including the further altered data. The generator includes a first ANN to input first information and output a set of artificial data that is classifiable using a first label and not classifiable using a second label of the set of labeled data.

    Credit based flow control mechanism for use in multiple link width interconnect systems

    公开(公告)号:US10671554B1

    公开(公告)日:2020-06-02

    申请号:US16271371

    申请日:2019-02-08

    Abstract: Flow control credit management is provided when converting traffic from a first parallel link width on a first link to a second parallel link width on a second link A current value is calculated for a variable flow control credit exchange rate (R) associated with the first and second links. A first flow control credit indicator is received on the second link, and a credit amount calculated based on the first flow control credit indicator and R. A second flow control credit indicator for the credit amount is then transmitted on the first link.

Patent Agency Ranking