-
公开(公告)号:US20210224065A1
公开(公告)日:2021-07-22
申请号:US17213509
申请日:2021-03-26
Applicant: TEXAS INSTRUMENTS INCORPORATED
IPC: G06F9/30 , G06F9/38 , G06F11/10 , G06F9/32 , G06F12/0875 , G06F12/0897 , G06F11/00 , G06F9/345
Abstract: A steaming engine in a system receives a first set of stream parameters into a queue to define a first stream along with an indication of either a queue mode of operation or a speculative mode of operation for the first stream. Acquisition of the first stream then begins. At some point, a second set of stream parameters is received into the queue to define a second stream. When the queue mode of operation was specified for the first stream, the second set of parameters is queued and acquisition of the second stream is delayed until completion of acquisition of the first stream. When the speculative mode of operation was specified for the first stream, acquisition of the first stream is canceled upon receipt of the second set of stream parameters and acquisition of the second stream begins immediately.
-
公开(公告)号:US11061679B2
公开(公告)日:2021-07-13
申请号:US16389682
申请日:2019-04-19
Applicant: Graphcore Limited
Abstract: A processor comprising an execution unit, memory and one or more register files. The execution unit is configured to execute instances of machine code instructions from an instruction set. The types of instruction defined in the instruction set include a double-load instruction for loading from the memory to at least one of the one or more register files. The execution unit is configured so as, when the load instruction is executed, to perform a first load operation strided by a fixed stride, and a second load operation strided by a variable stride, the variable stride being specified in a variable stride register in one of the one or more register files.
-
公开(公告)号:US11048513B2
公开(公告)日:2021-06-29
申请号:US16384537
申请日:2019-04-15
Applicant: TEXAS INSTRUMENTS INCORPORATED
Inventor: Timothy D. Anderson , Joseph Zbiciak , Duc Bui , Mel Alan Phipps , Todd T. Hahn
IPC: G06F9/38 , G06F11/00 , G06F12/0875 , G06F9/30 , G06F11/10 , G06F9/32 , G06F12/0897 , G06F9/345
Abstract: Techniques related to executing a plurality of instructions by a processor comprising receiving a first instruction for execution on an instruction execution pipeline, wherein the instruction execution pipeline is in a first execution mode, beginning execution of the first instruction on the instruction execution pipeline, receiving an execution mode instruction to switch the instruction execution pipeline to a second execution mode, switching the instruction execution pipeline to the second execution mode based on the received execution mode instruction, annulling the first instruction based on the execution mode instruction, receiving a second instruction for execution on the instruction execution pipeline, the second instruction, and executing the second instruction.
-
公开(公告)号:US20210182059A1
公开(公告)日:2021-06-17
申请号:US17131424
申请日:2020-12-22
Applicant: INTEL CORPORATION
Inventor: Berkin AKIN
Abstract: An apparatus and method for a tensor permutation engine. The TPE may include a read address generation unit (AGU) to generate a plurality of read addresses for the plurality of tensor data elements in a first storage and a write AGU to generate a plurality of write addresses for the plurality of tensor data elements in the first storage. The TPE may include a shuffle register bank comprising a register to read tensor data elements from the plurality of read addresses generated by the read AGU, a first register bank to receive the tensor data elements, and a shift register to receive a lowest tensor data element from each bank in the first register bank, each tensor data element in the shift register to be written to a write address from the plurality of write addresses generated by the write AGU.
-
公开(公告)号:US11036648B2
公开(公告)日:2021-06-15
申请号:US16227238
申请日:2018-12-20
Applicant: TEXAS INSTRUMENTS INCORPORATED
Inventor: Timothy D. Anderson , Joseph Zbiciak , Duc Quang Bui , Abhijeet Chachad , Kai Chirca , Naveen Bhoria , Matthew D. Pierson , Daniel Wu , Ramakrishnan Venkatasubramanian
IPC: G06F9/34 , G06F11/00 , G06F12/0875 , G06F12/1045 , G06F9/30 , G06F9/345 , G06F9/38 , G06F11/10 , G06F9/32 , G06F12/0897 , G06F12/0862 , G06F12/1009
Abstract: Disclosed embodiments include a data processing apparatus having a processing core, a memory, and a streaming engine. The streaming engine is configured to receive a plurality of data elements stored in the memory and to provide the plurality of data elements as a data stream to the processing core, and includes an address generator to generate addresses corresponding to locations in the memory, a buffer to store the data elements received from the locations in the memory corresponding to the generated addresses, and an output to supply the data elements received from the memory to the processing core as the data stream.
-
公开(公告)号:US11023239B2
公开(公告)日:2021-06-01
申请号:US16389682
申请日:2019-04-19
Applicant: Graphcore Limited
Abstract: A processor comprising an execution unit, memory and one or more register files. The execution unit is configured to execute instances of machine code instructions from an instruction set. The types of instruction defined in the instruction set include a double-load instruction for loading from the memory to at least one of the one or more register files. The execution unit is configured so as, when the load instruction is executed, to perform a first load operation strided by a fixed stride, and a second load operation strided by a variable stride, the variable stride being specified in a variable stride register in one of the one or more register files.
-
公开(公告)号:US11003450B2
公开(公告)日:2021-05-11
申请号:US15759900
申请日:2016-09-14
Applicant: ARM LIMITED
Inventor: Nigel John Stephens
Abstract: A vector data transfer instruction is provided for triggering a data transfer between storage locations corresponding to a contiguous block of addresses and multiple data elements of at least one vector register. The instruction specifies a start address of the contiguous block using a base register and an immediate offset value specifies as a multiple of the size of the contiguous block of addresses. This is useful for loop unrolling which can help to improve performance of vectorised code by combining multiple iterations of a loop into a single iteration of an unrolled loop, to reduce the loop control overhead.
-
公开(公告)号:US10963251B2
公开(公告)日:2021-03-30
申请号:US16314882
申请日:2017-06-15
Applicant: ARM LIMITED
Inventor: Thomas Christopher Grocutt
Abstract: There is provided an apparatus that includes a set of vector registers, each of the vector registers being arranged to store a vector comprising a plurality of portions. The set of vector registers is logically divided into a plurality of columns, each of the columns being arranged to store a same portion of each vector. The apparatus also includes register access circuitry that comprises a plurality of access blocks. Each access block is arranged to access a portion in a different column when accessing one of the vector registers than when accessing at least one other of the vector registers. The register access circuitry is arranged to simultaneously access portions in any one of: the vector registers and the columns.
-
公开(公告)号:US10901913B2
公开(公告)日:2021-01-26
申请号:US16251795
申请日:2019-01-18
Applicant: TEXAS INSTRUMENTS INCORPORATED
Inventor: Joseph Zbiciak , Son H. Tran
IPC: G06F12/10 , G06F12/1045 , G06F9/30 , G06F9/345 , G06F9/38 , G06F11/00 , G06F11/10 , G06F9/32 , G06F12/0875 , G06F12/0897 , G06F12/0862 , G06F12/1009
Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream. An address generator produces virtual addresses of data elements. An address translation unit converts these virtual addresses to physical addresses by comparing the most significant bits of a next address N with the virtual address bits of each entry in an address translation table. Upon a match, the translated address is the physical address bits of the matching entry and the least significant bits of address N. The address translation unit can generate two translated addresses. If the most significant bits of address N+1 match those of address N, the same physical address bits are used for translation of address N+1. The sequential nature of the data stream increases the probability that consecutive addresses match the same address translation entry and can use this technique.
-
公开(公告)号:US20200334042A1
公开(公告)日:2020-10-22
申请号:US16795758
申请日:2020-02-20
Applicant: Venu KANDADAI
Inventor: Venu KANDADAI
Abstract: This invention constitutes a method and apparatus for enabling parallel computations of intermediate operations which are generic in many algorithms in given applications and also contain most of the computationally intensive operations. The method includes designing a set of intermediate level functions suitable for predefined application, obtaining instructions corresponding to intermediate level operations from a processor, computing the addresses of the operands and the results, performing computations involved in multiple intermediate level operations. In an exemplary embodiment the apparatus consists of a local data address generator that computes the addresses of a plurality of operands and results, a programmable computational unit that performs parallels computations of the intermediate level operations and a local memory interface that is interfaced to local memory organized in multiple blocks. The local data address generator and programmable computational unit are configurable to cover any field requiring large computations.
-
-
-
-
-
-
-
-
-