SYSTEMS AND METHODS FOR HARDWARE-BASED POOLING

    公开(公告)号:US20190205738A1

    公开(公告)日:2019-07-04

    申请号:US15862369

    申请日:2018-01-04

    申请人: Tesla, Inc.

    摘要: Described herein are systems and methods that utilize a novel hardware-based pooling architecture to process the output of a convolution engine representing an output channel of a convolution layer in a convolutional neural network (CNN). The pooling system converts the output into a set of arrays and aligns them according to a pooling operation to generate a pooling result. In certain embodiments, this is accomplished by using an aligner that aligns, e.g., over a number of arithmetic cycles, an array of data in the output into rows and shifts the rows relative to each other. A pooler applies a pooling operation to a combination of a subset of data from each row to generate the pooling result.

    Systems and methods for encoding and decoding

    公开(公告)号:US10715175B2

    公开(公告)日:2020-07-14

    申请号:US15688808

    申请日:2017-08-28

    申请人: Tesla, Inc.

    摘要: Various embodiments of the invention provide systems, devices, and methods for decompressing encoded electronic data to increase decompression throughput using any number of decoding engines. In certain embodiments, this is accomplished by identifying and processing a next record in a pipeline operation before having to complete the decompression of a current record. Various embodiments take advantage of the knowledge of the method of how the records have been encoded, e.g., in a single long record, to greatly reduce delay time, compared with existing designs, when decompressing encoded electronic data.

    Systems and methods for low latency hardware memory management

    公开(公告)号:US10416899B2

    公开(公告)日:2019-09-17

    申请号:US16000248

    申请日:2018-06-05

    申请人: Tesla, Inc.

    IPC分类号: G06F12/00 G06F3/06 G06F12/02

    摘要: In various embodiment, the present invention teaches a sequencer that identifies an address point of a first data block within a memory and a length of data that comprises that data block and is related to an input of a matrix processor. The sequencer then calculates, based on the block length, the input length, and a memory map, a block count representative of a number of data blocks that are to be retrieved from the memory. Using the address pointer, the sequencer may retrieve a number of data blocks from the memory in a number of cycles that depends on whether the data blocks are contiguous. In embodiments, based on the length of data, a formatter then maps the data blocks to the input of the matrix processor.

    COMPUTATIONAL ARRAY MICROPROCESSOR SYSTEM WITH VARIABLE LATENCY MEMORY ACCESS

    公开(公告)号:US20190026237A1

    公开(公告)日:2019-01-24

    申请号:US15920150

    申请日:2018-03-13

    申请人: Tesla, Inc.

    摘要: A microprocessor system comprises a computational array and a hardware arbiter. The computational array includes a plurality of computation units. Each of the plurality of computation units operates on a corresponding value addressed from memory. The hardware arbiter is configured to control issuing of at least one memory request for one or more of the corresponding values addressed from the memory for the computation units. The hardware arbiter is also configured to schedule a control signal to be issued based on the issuing of the memory requests.

    ACCELERATED MATHEMATICAL ENGINE
    9.
    发明申请

    公开(公告)号:US20220365753A1

    公开(公告)日:2022-11-17

    申请号:US17816234

    申请日:2022-07-29

    申请人: Tesla, Inc.

    摘要: Various embodiments of the disclosure relate to an accelerated mathematical engine. In certain embodiments, the accelerated mathematical engine is applied to image processing such that convolution of an image is accelerated by using a two-dimensional matrix processor comprising sub-circuits that include an ALU, output register and shadow register. This architecture supports a clocked, two-dimensional architecture in which image data and weights are multiplied in a synchronized manner to allow a large number of mathematical operations to be performed in parallel.