Serving Large Language Models with 3D-DRAM Chiplets

    公开(公告)号:US20240403254A1

    公开(公告)日:2024-12-05

    申请号:US18210846

    申请日:2023-06-16

    Applicant: Google LLC

    Abstract: Disclosed systems and methods herein provide for high bandwidth processing using a plurality of compute-memory chiplets. A computing package may be configured with a plurality of the compute-memory chiplets in order to perform processing operations in connection with a large language model. The compute-memory chiplets may be configured to operate using small, low-power computing dies that can efficiently operate for workloads with low arithmetic intensity.

    CONVOLUTIONAL NEURAL NETWORK ON PROGRAMMABLE TWO DIMENSIONAL IMAGE PROCESSOR

    公开(公告)号:US20210004633A1

    公开(公告)日:2021-01-07

    申请号:US17028097

    申请日:2020-09-22

    Applicant: Google LLC

    Abstract: A method is described that includes executing a convolutional neural network layer on an image processor having an array of execution lanes and a two-dimensional shift register. The two-dimensional shift register provides local respective register space for the execution lanes. The executing of the convolutional neural network includes loading a plane of image data of a three-dimensional block of image data into the two-dimensional shift register. The executing of the convolutional neural network also includes performing a two-dimensional convolution of the plane of image data with an array of coefficient values by sequentially: concurrently multiplying within the execution lanes respective pixel and coefficient values to produce an array of partial products; concurrently summing within the execution lanes the partial products with respective accumulations of partial products being kept within the two dimensional register for different stencils within the image data; and, effecting alignment of values for the two-dimensional convolution within the execution lanes by shifting content within the two-dimensional shift register array.

Patent Agency Ranking