-
11.
公开(公告)号:US20240069511A1
公开(公告)日:2024-02-29
申请号:US17823902
申请日:2022-08-31
Applicant: Xilinx, Inc.
Inventor: Jorn Tuyls , Xiao Teng , Sanket Pandit , Rajeev Patwari , Qian Zhou , Ehsan Ghasemi , Ephrem C. Wu , Elliott Delaye , Aaron Ng
IPC: G05B19/042
CPC classification number: G05B19/042 , G05B2219/25255 , G05B2219/25257
Abstract: Instruction generation for a data processing array and microcontroller includes generating a tensor-level intermediate representation from a machine learning model using kernel expressions. Statements of the tensor-level intermediate representation are partitioned into a first set of statements and a second set of statements. From the first set of statements, kernel instructions are generated based on a reconfigurable neural engine model. The kernel instructions are executable by a compute tile of a data processing array to implement compute functions of the machine learning model. From the set of second statements, microcontroller instructions are generated based on a super-graph model. The microcontroller instructions are executable by a microcontroller of the data processing array to move data into and out from the data processing array.
-
公开(公告)号:US20240028556A1
公开(公告)日:2024-01-25
申请号:US17814817
申请日:2022-07-25
Applicant: Xilinx, Inc.
Inventor: Sanket Pandit , Jorn Tuyls , Xiao Teng , Rajeev Patwari , Ehsan Ghasemi , Elliott Delaye , Aaron Ng
CPC classification number: G06F15/8053 , G06F9/45533
Abstract: An integrated circuit includes a plurality of kernels and a virtual machine coupled to the plurality of kernels. The virtual machine is configured to interpret instructions directed to different ones of the plurality of kernels. The virtual machine is configured to control operation of the different ones of the plurality of kernels responsive to the instructions.
-
公开(公告)号:US11620490B2
公开(公告)日:2023-04-04
申请号:US15785800
申请日:2017-10-17
Applicant: Xilinx, Inc.
Inventor: Aaron Ng , Elliott Delaye , Ehsan Ghasemi , Xiao Teng , Jindrich Zejda , Yongjun Wu , Sean Settle , Ashish Sirasao
Abstract: In the disclosed methods and systems for processing in a neural network system, a host computer system writes a plurality of weight matrices associated with a plurality of layers of a neural network to a memory shared with a neural network accelerator. The host computer system further assembles a plurality of per-layer instructions into an instruction package. Each per-layer instruction specifies processing of a respective layer of the plurality of layers of the neural network, and respective offsets of weight matrices in a shared memory. The host computer system writes input data and the instruction package to the shared memory. The neural network accelerator reads the instruction package from the shared memory and processes the plurality of per-layer instructions of the instruction package.
-
公开(公告)号:US10824434B1
公开(公告)日:2020-11-03
申请号:US16204991
申请日:2018-11-29
Applicant: Xilinx, Inc.
Inventor: Sean Settle , Ehsan Ghasemi , Ashish Sirasao , Ralph D. Wittig
Abstract: Examples described herein relate to dynamically structured single instruction, multiple data (SIMD) instructions, and systems and circuits implementing such dynamically structured SIMD instructions. An example is a method for processing data. A first SIMD structure is determined by a processor. A characteristic of the first SIMD structure is altered by the processor to obtain a second SIMD structure. An indication of the second SIMD structure is communicated from the processor to a numerical engine. Data is packed by the numerical engine into an SIMD instruction according to the second SIMD structure. The SIMD instruction is transmitted from the numerical engine.
-
15.
公开(公告)号:US10678509B1
公开(公告)日:2020-06-09
申请号:US16106743
申请日:2018-08-21
Applicant: Xilinx, Inc.
Inventor: Sean Settle , Elliott Delaye , Aaron Ng , Ehsan Ghasemi , Ashish Sirasao , Xiao Teng , Jindrich Zejda
Abstract: An example multiply accumulate (MACC) circuit includes a multiply-accumulator having an accumulator output register, a scaler, coupled to the multiply accumulator, and a control circuit coupled to the multiply-accumulator and the scaler. The control circuit is configured to provide control data to the scaler, the control data indicative of: a most-significant bit (MSB) to least significant bit (LSB) range for selecting bit indices from the accumulator output register for implementing a first right shift; a multiplier; and a second right shift.
-
公开(公告)号:US10572225B1
公开(公告)日:2020-02-25
申请号:US16142406
申请日:2018-09-26
Applicant: Xilinx, Inc.
Inventor: Ehsan Ghasemi , Elliott Delaye , Ashish Sirasao , Sean Settle
Abstract: A and a request generator circuit is configured to read data elements of a three-dimensional (3-D) input feature map (IFM) from a memory and store a subset of the data elements in one of a plurality of N line buffers. Each line buffer is configured for storage of M data elements. A pixel iterator circuit is coupled to the line buffers and is configured to generate a sequence of addresses for reading the stored data elements from the line buffers based on a sequence of IFM height values and a sequence of IFM width values.
-
17.
公开(公告)号:US10460416B1
公开(公告)日:2019-10-29
申请号:US15786244
申请日:2017-10-17
Applicant: Xilinx, Inc.
Inventor: Ashish Sirasao , Elliott Delaye , Aaron Ng , Ehsan Ghasemi
IPC: G06T1/20 , G06T1/60 , H04N21/2381 , G06F3/03 , H03K19/177 , H04N5/30 , G06F12/00
Abstract: An example preprocessor circuit for formatting image data into a plurality of streams of image samples includes: a plurality of memory banks configured to store the image data; multiplexer circuitry coupled to the memory banks; a first plurality of registers coupled to the multiplexer circuitry; a second plurality of registers coupled to the first plurality of registers, outputs of the second plurality of registers configured to provide the plurality of streams of image samples; and control circuitry configured to generate addresses for the plurality of memory banks, control the multiplexer circuitry to select among outputs of the plurality of memory banks, control the first plurality of registers to store outputs of the second plurality of multiplexers, and control the second plurality of registers to store outputs of the first plurality of registers.
-
公开(公告)号:US10411709B1
公开(公告)日:2019-09-10
申请号:US16045657
申请日:2018-07-25
Applicant: Xilinx, Inc.
Inventor: Ehsan Ghasemi , Elliott Delaye , Ashish Sirasao
IPC: H03K19/173 , G06F17/50 , H03K19/177 , G06F12/0802
Abstract: Disclosed circuits and methods include N line buffers. Each line buffer is configured for storage of M data elements of a three-dimensional (3-D) input feature map (IFM). A request generator circuit is coupled to the N line buffers and to a memory configured for storage of the 3-D IFM. The request generator circuit is divides the 3-D IFM into a plurality of IFM sub-volumes based on values of N, M, and dimensions of the 3-D IFM. The request generator circuit reads from the memory, data elements at addresses of an unprocessed one of the IFM sub-volumes and stores the data elements of the unprocessed one of the IFM sub-volumes in the N line buffers. In response to a completion signal, the request generator circuit repeats the reading of an unprocessed one of the IFM sub-volumes and storing the data elements in the N line buffers.
-
-
-
-
-
-
-