-
公开(公告)号:US20240037931A1
公开(公告)日:2024-02-01
申请号:US18359786
申请日:2023-07-26
IPC分类号: G06V10/96 , G06V10/764 , G06T7/11 , G06V10/82 , G06V10/80
CPC分类号: G06V10/96 , G06V10/764 , G06T7/11 , G06V10/82 , G06V10/803 , G06T2207/20084
摘要: A system for providing an enhanced vision transformer block for mobile vision transformers to perform computer vision tasks, such as image classification, segmentation, and objected detection is disclosed. A local representation block of the block applies a depthwise-separable convolutional layer to vectors of an input image to facilitate creation of local representation outputs associated with the image. The local representation output is fed into a global representation block, which unfolds the local representation outputs, applies vision transformers, and folds the result to generate a global representation output associated with the image. The global representation output is fed to a fusion block, which concatenates the local representations with the global representations, applies a point-wise convolution to the concatenation to generate a fusion block output, and fuses input features of the image with the fusion block out to generate an output to facilitate performance of a computer vision tasks.
-
公开(公告)号:US20240046630A1
公开(公告)日:2024-02-08
申请号:US18359774
申请日:2023-07-26
IPC分类号: G06V10/82 , G06V10/80 , G06V10/77 , G06V10/764 , G06V10/26
CPC分类号: G06V10/82 , G06V10/806 , G06V10/7715 , G06V10/764 , G06V10/26
摘要: A system for optimizing a vision transformer block for use with mobile vision transformers utilized for tasks, such as image classification, segmentation, and objected detection is disclosed. The system includes incorporating a 1×1 convolutional layer in place of a 3×3 convolutional layer in a fusion block of the vision transformer block to reduce constraints on scaling neural network size. Additionally, the system includes fusing local and global representations in the fusion block of the vision transformer block instead of fusing input features and global representations. Furthermore, the system includes fusing input features in the fusion block by adding the input features to the output of the 1×1 convolutional layer of the fusion block. Moreover, the system includes substituting a 3×3 convolutional layer in the local representation block of the vision transformer block with a depthwise-separable 3×3 convolutional layer. The optimized transformer block enhances image classification, segmentation, and object detection.
-
公开(公告)号:US20240257531A1
公开(公告)日:2024-08-01
申请号:US18420489
申请日:2024-01-23
摘要: Methods, systems, and devices for techniques to implement transformers with multi-task neural networks are described. A vehicle system may employ one or more transformer models in a machine learning system to generate an indication of a one or more objects in an image, one or more drivable areas in an image, one or more lane lines in an image, or a combination thereof. The multi-task system may include a feature extractor which uses a set of convolutional layers to generate a corresponding set of representation vectors of the image. The system may pass the representation vectors to a set of transformer models, such that each of the transformer models share a common input. Each transformer model may use the representation vectors to generate a respective indication.
-
-