Transposing at-speed in a vector-matrix accelerator

    公开(公告)号:US12164917B1

    公开(公告)日:2024-12-10

    申请号:US18198387

    申请日:2023-05-17

    Applicant: Google LLC

    Abstract: A system including one or more processors configured to receive a transpose instruction indicating to transpose a source matrix to a result matrix, provide data elements of the source matrix to input switching circuits, reorder the data elements using the input switching circuits, provide the data elements from the input switching circuits to one or more lanes of a datapath, provide the data elements from the datapath to output switching circuits, undo the reordering of the data elements using the output switching circuits, and provide the data elements from the output switching circuits to a result matrix. Each respective lane of the datapath receiving data elements receives multiple data elements directed to different respective non-overlapping portions of the lane.

    Direct memory access architecture with multi-level multi-striding

    公开(公告)号:US11314674B2

    公开(公告)日:2022-04-26

    申请号:US16838796

    申请日:2020-04-02

    Applicant: Google LLC

    Abstract: DMA architectures capable of performing multi-level multi-striding and determining multiple memory addresses in parallel are described. In one aspect, a DMA system includes one or more hardware DMA threads. Each DMA thread includes a request generator configured to generate, during each parallel memory address computation cycle, m memory addresses for a multi-dimensional tensor in parallel and, for each memory address, a respective request for a memory system to perform a memory operation. The request generator includes m memory address units that each include a step tracker configured to generate, for each dimension of the tensor, a respective step index value for the dimension and, based on the respective step index value, a respective stride offset value for the dimension. Each memory address unit includes a memory address computation element configured to generate a memory address for a tensor element and transmit the request to perform the memory operation.

    Transposing At-Speed in a Vector-Matrix Accelerator

    公开(公告)号:US20240385837A1

    公开(公告)日:2024-11-21

    申请号:US18198387

    申请日:2023-05-17

    Applicant: Google LLC

    Abstract: A system including one or more processors configured to receive a transpose instruction indicating to transpose a source matrix to a result matrix, provide data elements of the source matrix to input switching circuits, reorder the data elements using the input switching circuits, provide the data elements from the input switching circuits to one or more lanes of a datapath, provide the data elements from the datapath to output switching circuits, undo the reordering of the data elements using the output switching circuits, and provide the data elements from the output switching circuits to a result matrix. Each respective lane of the datapath receiving data elements receives multiple data elements directed to different respective non-overlapping portions of the lane.

    DIRECT MEMORY ACCESS ARCHITECTURE WITH MULTI-LEVEL MULTI-STRIDING

    公开(公告)号:US20240070098A1

    公开(公告)日:2024-02-29

    申请号:US18229616

    申请日:2023-08-02

    Applicant: Google LLC

    CPC classification number: G06F13/28 G06F1/04

    Abstract: DMA architectures capable of performing multi-level multi-striding and determining multiple memory addresses in parallel are described. In one aspect, a DMA system includes one or more hardware DMA threads. Each DMA thread includes a request generator configured to generate, during each parallel memory address computation cycle, m memory addresses for a multi-dimensional tensor in parallel and, for each memory address, a respective request for a memory system to perform a memory operation. The request generator includes m memory address units that each include a step tracker configured to generate, for each dimension of the tensor, a respective step index value for the dimension and, based on the respective step index value, a respective stride offset value for the dimension. Each memory address unit includes a memory address computation element configured to generate a memory address for a tensor element and transmit the request to perform the memory operation.

    Direct memory access architecture with multi-level multi-striding

    公开(公告)号:US11762793B2

    公开(公告)日:2023-09-19

    申请号:US17728478

    申请日:2022-04-25

    Applicant: Google LLC

    CPC classification number: G06F13/28 G06F1/04

    Abstract: DMA architectures capable of performing multi-level multi-striding and determining multiple memory addresses in parallel are described. In one aspect, a DMA system includes one or more hardware DMA threads. Each DMA thread includes a request generator configured to generate, during each parallel memory address computation cycle, m memory addresses for a multi-dimensional tensor in parallel and, for each memory address, a respective request for a memory system to perform a memory operation. The request generator includes m memory address units that each include a step tracker configured to generate, for each dimension of the tensor, a respective step index value for the dimension and, based on the respective step index value, a respective stride offset value for the dimension. Each memory address unit includes a memory address computation element configured to generate a memory address for a tensor element and transmit the request to perform the memory operation.

    DIRECT MEMORY ACCESS ARCHITECTURE WITH MULTI-LEVEL MULTI-STRIDING

    公开(公告)号:US20220327075A1

    公开(公告)日:2022-10-13

    申请号:US17728478

    申请日:2022-04-25

    Applicant: Google LLC

    Abstract: DMA architectures capable of performing multi-level multi-striding and determining multiple memory addresses in parallel are described. In one aspect, a DMA system includes one or more hardware DMA threads. Each DMA thread includes a request generator configured to generate, during each parallel memory address computation cycle, m memory addresses for a multi-dimensional tensor in parallel and, for each memory address, a respective request for a memory system to perform a memory operation. The request generator includes m memory address units that each include a step tracker configured to generate, for each dimension of the tensor, a respective step index value for the dimension and, based on the respective step index value, a respective stride offset value for the dimension. Each memory address unit includes a memory address computation element configured to generate a memory address for a tensor element and transmit the request to perform the memory operation.

    DIRECT MEMORY ACCESS ARCHITECTURE WITH MULTI-LEVEL MULTI-STRIDING

    公开(公告)号:US20210255976A1

    公开(公告)日:2021-08-19

    申请号:US16838796

    申请日:2020-04-02

    Applicant: Google LLC

    Abstract: DMA architectures capable of performing multi-level multi-striding and determining multiple memory addresses in parallel are described. In one aspect, a DMA system includes one or more hardware DMA threads. Each DMA thread includes a request generator configured to generate, during each parallel memory address computation cycle, m memory addresses for a multi-dimensional tensor in parallel and, for each memory address, a respective request for a memory system to perform a memory operation. The request generator includes m memory address units that each include a step tracker configured to generate, for each dimension of the tensor, a respective step index value for the dimension and, based on the respective step index value, a respective stride offset value for the dimension. Each memory address unit includes a memory address computation element configured to generate a memory address for a tensor element and transmit the request to perform the memory operation.

Patent Agency Ranking