SYSTEM FOR PROVIDING ENHANCED VISION TRANSFORMER BLOCKS FOR COMPUTER VISION

    公开(公告)号:US20240037931A1

    公开(公告)日:2024-02-01

    申请号:US18359786

    申请日:2023-07-26

    摘要: A system for providing an enhanced vision transformer block for mobile vision transformers to perform computer vision tasks, such as image classification, segmentation, and objected detection is disclosed. A local representation block of the block applies a depthwise-separable convolutional layer to vectors of an input image to facilitate creation of local representation outputs associated with the image. The local representation output is fed into a global representation block, which unfolds the local representation outputs, applies vision transformers, and folds the result to generate a global representation output associated with the image. The global representation output is fed to a fusion block, which concatenates the local representations with the global representations, applies a point-wise convolution to the concatenation to generate a fusion block output, and fuses input features of the image with the fusion block out to generate an output to facilitate performance of a computer vision tasks.

    SYSTEM FOR OPTIMIZING VISION TRANSFORMER BLOCKS

    公开(公告)号:US20240046630A1

    公开(公告)日:2024-02-08

    申请号:US18359774

    申请日:2023-07-26

    摘要: A system for optimizing a vision transformer block for use with mobile vision transformers utilized for tasks, such as image classification, segmentation, and objected detection is disclosed. The system includes incorporating a 1×1 convolutional layer in place of a 3×3 convolutional layer in a fusion block of the vision transformer block to reduce constraints on scaling neural network size. Additionally, the system includes fusing local and global representations in the fusion block of the vision transformer block instead of fusing input features and global representations. Furthermore, the system includes fusing input features in the fusion block by adding the input features to the output of the 1×1 convolutional layer of the fusion block. Moreover, the system includes substituting a 3×3 convolutional layer in the local representation block of the vision transformer block with a depthwise-separable 3×3 convolutional layer. The optimized transformer block enhances image classification, segmentation, and object detection.

    TECHNIQUES TO IMPLEMENT TRANSFORMERS WITH MULTI-TASK NEURAL NETWORKS

    公开(公告)号:US20240257531A1

    公开(公告)日:2024-08-01

    申请号:US18420489

    申请日:2024-01-23

    IPC分类号: G06V20/58 G06V10/82

    CPC分类号: G06V20/58 G06V10/82

    摘要: Methods, systems, and devices for techniques to implement transformers with multi-task neural networks are described. A vehicle system may employ one or more transformer models in a machine learning system to generate an indication of a one or more objects in an image, one or more drivable areas in an image, one or more lane lines in an image, or a combination thereof. The multi-task system may include a feature extractor which uses a set of convolutional layers to generate a corresponding set of representation vectors of the image. The system may pass the representation vectors to a set of transformer models, such that each of the transformer models share a common input. Each transformer model may use the representation vectors to generate a respective indication.