-
公开(公告)号:US12164917B1
公开(公告)日:2024-12-10
申请号:US18198387
申请日:2023-05-17
Applicant: Google LLC
Inventor: Vinayak Anand Gokhale , Matthew Leever Hedlund , Matthew William Ashcraft , Indranil Chakraborty
Abstract: A system including one or more processors configured to receive a transpose instruction indicating to transpose a source matrix to a result matrix, provide data elements of the source matrix to input switching circuits, reorder the data elements using the input switching circuits, provide the data elements from the input switching circuits to one or more lanes of a datapath, provide the data elements from the datapath to output switching circuits, undo the reordering of the data elements using the output switching circuits, and provide the data elements from the output switching circuits to a result matrix. Each respective lane of the datapath receiving data elements receives multiple data elements directed to different respective non-overlapping portions of the lane.
-
公开(公告)号:US11314674B2
公开(公告)日:2022-04-26
申请号:US16838796
申请日:2020-04-02
Applicant: Google LLC
Abstract: DMA architectures capable of performing multi-level multi-striding and determining multiple memory addresses in parallel are described. In one aspect, a DMA system includes one or more hardware DMA threads. Each DMA thread includes a request generator configured to generate, during each parallel memory address computation cycle, m memory addresses for a multi-dimensional tensor in parallel and, for each memory address, a respective request for a memory system to perform a memory operation. The request generator includes m memory address units that each include a step tracker configured to generate, for each dimension of the tensor, a respective step index value for the dimension and, based on the respective step index value, a respective stride offset value for the dimension. Each memory address unit includes a memory address computation element configured to generate a memory address for a tensor element and transmit the request to perform the memory operation.
-
公开(公告)号:US20240385837A1
公开(公告)日:2024-11-21
申请号:US18198387
申请日:2023-05-17
Applicant: Google LLC
Inventor: Vinayak Anand Gokhale , Matthew Leever Hedlund , Matthew William Ashcraft , Indranil Chakraborty
Abstract: A system including one or more processors configured to receive a transpose instruction indicating to transpose a source matrix to a result matrix, provide data elements of the source matrix to input switching circuits, reorder the data elements using the input switching circuits, provide the data elements from the input switching circuits to one or more lanes of a datapath, provide the data elements from the datapath to output switching circuits, undo the reordering of the data elements using the output switching circuits, and provide the data elements from the output switching circuits to a result matrix. Each respective lane of the datapath receiving data elements receives multiple data elements directed to different respective non-overlapping portions of the lane.
-
公开(公告)号:US20240070098A1
公开(公告)日:2024-02-29
申请号:US18229616
申请日:2023-08-02
Applicant: Google LLC
IPC: G06F13/28
Abstract: DMA architectures capable of performing multi-level multi-striding and determining multiple memory addresses in parallel are described. In one aspect, a DMA system includes one or more hardware DMA threads. Each DMA thread includes a request generator configured to generate, during each parallel memory address computation cycle, m memory addresses for a multi-dimensional tensor in parallel and, for each memory address, a respective request for a memory system to perform a memory operation. The request generator includes m memory address units that each include a step tracker configured to generate, for each dimension of the tensor, a respective step index value for the dimension and, based on the respective step index value, a respective stride offset value for the dimension. Each memory address unit includes a memory address computation element configured to generate a memory address for a tensor element and transmit the request to perform the memory operation.
-
公开(公告)号:US11762793B2
公开(公告)日:2023-09-19
申请号:US17728478
申请日:2022-04-25
Applicant: Google LLC
Abstract: DMA architectures capable of performing multi-level multi-striding and determining multiple memory addresses in parallel are described. In one aspect, a DMA system includes one or more hardware DMA threads. Each DMA thread includes a request generator configured to generate, during each parallel memory address computation cycle, m memory addresses for a multi-dimensional tensor in parallel and, for each memory address, a respective request for a memory system to perform a memory operation. The request generator includes m memory address units that each include a step tracker configured to generate, for each dimension of the tensor, a respective step index value for the dimension and, based on the respective step index value, a respective stride offset value for the dimension. Each memory address unit includes a memory address computation element configured to generate a memory address for a tensor element and transmit the request to perform the memory operation.
-
公开(公告)号:US20220327075A1
公开(公告)日:2022-10-13
申请号:US17728478
申请日:2022-04-25
Applicant: Google LLC
IPC: G06F13/28
Abstract: DMA architectures capable of performing multi-level multi-striding and determining multiple memory addresses in parallel are described. In one aspect, a DMA system includes one or more hardware DMA threads. Each DMA thread includes a request generator configured to generate, during each parallel memory address computation cycle, m memory addresses for a multi-dimensional tensor in parallel and, for each memory address, a respective request for a memory system to perform a memory operation. The request generator includes m memory address units that each include a step tracker configured to generate, for each dimension of the tensor, a respective step index value for the dimension and, based on the respective step index value, a respective stride offset value for the dimension. Each memory address unit includes a memory address computation element configured to generate a memory address for a tensor element and transmit the request to perform the memory operation.
-
公开(公告)号:US20210255976A1
公开(公告)日:2021-08-19
申请号:US16838796
申请日:2020-04-02
Applicant: Google LLC
IPC: G06F13/28
Abstract: DMA architectures capable of performing multi-level multi-striding and determining multiple memory addresses in parallel are described. In one aspect, a DMA system includes one or more hardware DMA threads. Each DMA thread includes a request generator configured to generate, during each parallel memory address computation cycle, m memory addresses for a multi-dimensional tensor in parallel and, for each memory address, a respective request for a memory system to perform a memory operation. The request generator includes m memory address units that each include a step tracker configured to generate, for each dimension of the tensor, a respective step index value for the dimension and, based on the respective step index value, a respective stride offset value for the dimension. Each memory address unit includes a memory address computation element configured to generate a memory address for a tensor element and transmit the request to perform the memory operation.
-
-
-
-
-
-