EXTREME-BANDWIDTH SCALABLE PERFORMANCE-PER-WATT GPU ARCHITECTURE

    公开(公告)号:US20190196742A1

    公开(公告)日:2019-06-27

    申请号:US15851476

    申请日:2017-12-21

    Abstract: A technique for accessing memory in an accelerated processing device coupled to stacked memory dies is provided herein. The technique includes receiving a memory access request from an execution unit and identifying whether the memory access request corresponds to memory cells of the stacked dies that are considered local to the execution unit or non-local. For local accesses, the access is made “directly”, that is, without using a bus. A control die coordinates operations for such local accesses, activating particular through-silicon-vias associated with the memory cells that include the data for the access. Non-local accesses are made via a distributed cache fabric and an interconnect bus in the control die. Various other features and details are provided below.

    METHOD AND APPARATUS OF PERFORMING A MEMORY OPERATION IN A HIERARCHICAL MEMORY ASSEMBLY

    公开(公告)号:US20190065100A1

    公开(公告)日:2019-02-28

    申请号:US15686121

    申请日:2017-08-24

    Inventor: Dmitri Yudanov

    Abstract: A method and apparatus of performing a memory operation includes receiving a memory operation request at a first memory controller that is in communication with a second memory controller. The first memory controller forwards the memory operation request to the second memory controller. Upon receipt of the memory operation request, the second memory controller provides first information or second information depending on a condition of a pseudo-bank of the second memory controller and a type of the memory operation request.

    HETEROGENEOUS FUNCTION UNIT DISPATCH IN A GRAPHICS PROCESSING UNIT
    28.
    发明申请
    HETEROGENEOUS FUNCTION UNIT DISPATCH IN A GRAPHICS PROCESSING UNIT 审中-公开
    图形处理单元中异构功能单元分配

    公开(公告)号:US20160085551A1

    公开(公告)日:2016-03-24

    申请号:US14490213

    申请日:2014-09-18

    CPC classification number: G06F9/3887 G06F9/3851

    Abstract: A compute unit configured to execute multiple threads in parallel is presented. The compute unit includes one or more single instruction multiple data (SIMD) units and a fetch and decode logic. The SIMD units have differing numbers of arithmetic logic units (ALUs), such that each SIMD unit can execute a different number of threads. The fetch and decode logic is in communication with each of the SIMD units, and is configured to assign the threads to the SIMD units for execution based on such differing numbers of ALUs.

    Abstract translation: 呈现并行执行多个线程的计算单元。 计算单元包括一个或多个单指令多数据(SIMD)单元和读取和解码逻辑。 SIMD单元具有不同数量的算术逻辑单元(ALU),使得每个SIMD单元可以执行不同数量的线程。 获取和解码逻辑与每个SIMD单元通信,并且被配置为基于这样不同数量的ALU将线程分配给SIMD单元以供执行。

Patent Agency Ranking