-
1.
公开(公告)号:US20200012618A1
公开(公告)日:2020-01-09
申请号:US16028072
申请日:2018-07-05
Applicant: QUALCOMM Incorporated
Inventor: Hadi Parandeh Afshar , Amrit Panda , Eric Rotenberg , Gregory Michael Wright
Abstract: Providing reconfigurable fusion of processing elements (PEs) in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device provides a vector processor including a plurality of PEs and a decode/control circuit. The decode/control circuit receives an instruction block containing a vectorizable loop comprising a loop body. The decode/control circuit determines how many PEs of the plurality of PEs are required to execute the loop body, and reconfigures the plurality of PEs into one or more fused PEs, each including the determined number of PEs required to execute the loop body. The plurality of PEs, reconfigured into one or more fused PEs, then executes one or more loop iterations of the loop body. Some aspects further include a PE communications link interconnecting the plurality of PEs, to enable communications between PEs of a fused PE and communications of inter-iteration data dependencies between PEs without requiring vector register file access operations.
-
2.
公开(公告)号:US20190384606A1
公开(公告)日:2019-12-19
申请号:US16012347
申请日:2018-06-19
Applicant: QUALCOMM Incorporated
Inventor: Amrit Panda , Eric Rotenberg , Hadi Parandeh Afshar , Gregory Michael Wright
Abstract: Enabling parallel memory accesses by providing explicit affine instructions in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device implementing a block-based dataflow instruction set architecture (ISA) includes a decoder circuit configured to provide an affine instruction that specifies a base parameter indicating a base value B, a stride parameter indicating a stride interval value S, and a count parameter indicating a count value C. The decoder circuit of the vector-processor-based device decodes the affine instruction, and generates an output stream comprising one or more output values, wherein a count of the output values of the output stream equals the count value C. Using an index X where 0≤X
-
公开(公告)号:US11614941B2
公开(公告)日:2023-03-28
申请号:US15942344
申请日:2018-03-30
Applicant: QUALCOMM Incorporated
Inventor: Amrit Panda , Francisco Perez , Karamvir Chatha
Abstract: An apparatus for hardware acceleration for use in operating a computational network is configured for determining that a loop structure including one or more loops is to be executed by a first processor. Each of the one or more loops includes a set of operations. The loop structure may be configured as a nested loop, a cascaded or a combination of the two. A second processor may be configured to decouple overhead operations of the loop structure from compute operations of the loop structure. The apparatus accelerates processing of the loop structure by simultaneously processing the overhead operations using the second processor separately from processing the compute operations based on the configuration to operate the computational network.
-
4.
公开(公告)号:US11048509B2
公开(公告)日:2021-06-29
申请号:US16000580
申请日:2018-06-05
Applicant: QUALCOMM Incorporated
Inventor: Hadi Parandeh Afshar , Amrit Panda , Eric Rotenberg , Gregory Michael Wright
Abstract: Providing multi-element multi-vector (MEMV) register file access in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device includes a vector processor comprising multiple processing elements (PEs) communicatively coupled via a corresponding plurality of channels to a vector register file comprising a plurality of memory banks. The vector processor provides a direct memory access (DMA) controller that is configured to receive a plurality of vectors that each comprise a plurality of vector elements representing operands for processing a loop iteration. The DMA controller arranges the vectors in the vector register file such that, for each group of vectors to be accessed in parallel, vector elements for each vector are stored consecutively, but corresponding vector elements of consecutive vectors are stored in different memory banks of the vector register file. As a result, multiple elements of multiple vectors may be accessed with a single vector register file access operation.
-
-
-