-
公开(公告)号:US20220100813A1
公开(公告)日:2022-03-31
申请号:US17032314
申请日:2020-09-25
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Sateesh LAGUDU , Allen H. RUSH , Michael MANTOR , Arun Vaidyanathan ANANTHANARAYAN , Prasad NAGABHUSHANAMGARI
Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that are dynamically mapped to mutually exclusive subsets of the rows and columns of the processor element arrays based on dimensions of matrices that provide the parameter values to the processor element arrays. In some cases, the processor element arrays are vector arithmetic logic unit (ALU) processors and the memory interfaces are direct memory access (DMA) engines. The rows of the processor element arrays in the subsets are mutually exclusive to the rows in the other subsets and the columns of the processor element arrays in the subsets are mutually exclusive to the columns in the other subsets. The matrices can be symmetric or asymmetric, e.g., one of the matrices can be a vector having a single column.
-
公开(公告)号:US20220100528A1
公开(公告)日:2022-03-31
申请号:US17032307
申请日:2020-09-25
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Sateesh LAGUDU , Allen H. RUSH , Michael MANTOR , Arun Vaidyanathan ANANTHANARAYAN , Prasad NAGABHUSHANAMGARI , Maxim V. KAZAKOV
Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that broadcast sets of the parameter values to mutually exclusive subsets of the rows and columns of the processor element arrays. In some cases, the array processor includes single-instruction-multiple-data (SIMD) units including subsets of the processor element arrays in corresponding rows, workgroup processors (WGPs) including subsets of the SIMD units, and a memory fabric configured to interconnect with an external memory that stores the parameter values. The memory interfaces broadcast the parameter values to the SIMD units that include the processor element arrays in rows associated with the memory interfaces and columns of processor element arrays that are implemented across the SIMD units in the WGPs. The memory interfaces access the parameter values from the external memory via the memory fabric.
-
公开(公告)号:US20230289191A1
公开(公告)日:2023-09-14
申请号:US18128642
申请日:2023-03-30
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Sateesh LAGUDU , Allen H. Rush , Michael Mantor , Arun Vaidyanathan Ananthanarayan , Prasad Nagabhushanamgari , Maxim V. Kazakov
CPC classification number: G06F9/3887 , G06F13/28 , G06F13/4027
Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that broadcast sets of the parameter values to mutually exclusive subsets of the rows and columns of the processor element arrays. In some cases, the array processor includes single-instruction-multiple-data (SIMD) units including subsets of the processor element arrays in corresponding rows, workgroup processors (WGPs) including subsets of the SIMD units, and a memory fabric configured to interconnect with an external memory that stores the parameter values. The memory interfaces broadcast the parameter values to the SIMD units that include the processor element arrays in rows associated with the memory interfaces and columns of processor element arrays that are implemented across the SIMD units in the WGPs. The memory interfaces access the parameter values from the external memory via the memory fabric.
-
公开(公告)号:US20180167622A1
公开(公告)日:2018-06-14
申请号:US15414466
申请日:2017-01-24
Applicant: Advanced Micro Devices, Inc.
Inventor: Mahalakshmi THIKKIREDDY , Sateesh LAGUDU
IPC: H04N19/176 , H04N19/182 , H04N19/625 , H04N19/80 , H04N19/423 , H04N19/136
CPC classification number: H04N19/176 , H04N19/136 , H04N19/182 , H04N19/423 , H04N19/625 , H04N19/80
Abstract: A first memory stores values of blocks of pixels representative of a digital image, a second memory stores partial values of destination pixels in a thumbnail image, and a third memory stores compressed images and thumbnail images. A processor retrieves values of a block of pixels from the first memory. The processor also concurrently compresses the values to generate a compressed image and modify a partial value of a destination pixel based on values of pixels in portions of the block that overlap a scaling window for the destination pixel. The processor stores the modified partial value in the second memory and stores the compressed image and the thumbnail image in the third memory.
-
公开(公告)号:US20220197973A1
公开(公告)日:2022-06-23
申请号:US17125457
申请日:2020-12-17
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Sateesh LAGUDU , Allen H. RUSH , Michael MANTOR
Abstract: A processing system includes a first set and a second set of general-purpose registers (GPRs) and memory access circuitry that fetches nonzero values of a sparse matrix into consecutive slots in the first set. The memory access circuitry also fetches values of an expanded matrix into consecutive slots in the second set of GPRs. The expanded matrix is formed based on values of a vector and locations of the nonzero values in the sparse matrix. The processing system also includes a set of multipliers that concurrently perform multiplication of the nonzero values in slots of the first set of GPRs with the values of the vector in corresponding slots of the second set. Reduced sum circuitry accumulates results from the set of multipliers for rows of the sparse matrix.
-
公开(公告)号:US20220197655A1
公开(公告)日:2022-06-23
申请号:US17548105
申请日:2021-12-10
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Sateesh LAGUDU , Arun Vaidyanathan ANANTHANARAYAN , Michael Mantor , Allen H. Rush
Abstract: An array processor includes processor element arrays (PEAs) distributed in rows and columns. The PEAs are configured to perform operations on parameter values. A first sequencer received a first direct memory access (DMA) instruction that includes a request to read data from at least one address in memory. A texture address (TA) engine requests the data from the memory based on the at least one address and a texture data (TD) engine provides the data to the PEAs. The PEAs provide first synchronization signals to the TD engine to indicate availability of registers for receiving the data. The TD engine provides second synchronization signals to the first sequencer in response to receiving acknowledgments that the PEAs have consumed the data.
-
7.
公开(公告)号:US20210081172A1
公开(公告)日:2021-03-18
申请号:US16571728
申请日:2019-09-16
Applicant: ADVANCED MICRO DEVICES, INC.
Inventor: Prasad NAGABHUSHANAMGARI , Sateesh LAGUDU
Abstract: A multipartite lookup table (LUT) is used to implement transcendental functions such as a binary logarithm, a binary anti-logarithm, or both. The multipartite LUT includes a plurality of LUTs that map partitions of bits representative of an input number to values of a transcendental function of the bits representative of the input number. The input number is in a first floating-point format. The implementation of the multipartite LUT includes output circuitry to combine the values of the transcendental function to produce an output number in a second floating-point format. The output number is equal to the transcendental function of the input number. Addresses of the plurality of LUTs are indicated by the partitions of the bits representative of the input number.
-
-
-
-
-
-